WO2015188395A1

WO2015188395A1 - Big data oriented metabolome feature data analysis method and system thereof

Info

Publication number: WO2015188395A1
Application number: PCT/CN2014/080283
Authority: WO
Inventors: 周家锐; 华韵之; 纪震; 朱泽轩; 曾启明
Original assignee: 周家锐; 华韵之; 纪震; 朱泽轩; 曾启明
Priority date: 2014-06-13
Filing date: 2014-06-19
Publication date: 2015-12-17
Also published as: CN104063631A; CN104063631B

Abstract

A big data oriented metabolome feature data analysis method and system thereof, the method comprising: A. receiving inputted metabolome feature data, dividing into a plurality of data blocks, and mapping the plurality of data blocks to respective operation nodes in a map-reduce frame; B. optimizing the weighted values of the plurality of data blocks by using a computation intelligent method; C. combining the optimized weighted values of the plurality of data blocks into a weighted value of the overall metabolome feature data and outputting the weighted value of the overall metabolome feature data. The data block processing mechanism of the system reduces weighting analysis difficulty and effectively improves prediction accuracy. In addition, a parallel structure enables the system to be deployed at a plurality of computing nodes, significantly reducing operation time while ensuring the efficiency and stability of the system. The computation intelligent algorithm used in the system can effectively solve the problem of complicated large-scale optimization, providing better predictive accuracy to realize more effective prediction on the target physiological status.

Description

Metadata group characteristic data analysis method and system for big data

The present invention relates to the field of bioinformatics, and in particular to a method and system for analyzing metabolome characteristic data for big data. Background technique

Metabolites are a general term for small molecular organic compounds that complete metabolic processes in living organisms and contain a wealth of physiological state information. Metabolomics is a systematic and systematic study of metabolites that effectively reveals the biochemical mechanisms behind metabolic phenomena. Metabolomics is thought to provide a more comprehensive picture of the true state of a living being compared to traditional research methods. Therefore, it has gained more and more attention and is widely used in many scientific research and practical fields.

The signal data obtained by the collection and detection of metabolites, called metabolome characteristic data, is the basic object of metabolomics research. It is usually analyzed using machine learning methods to mine physiological state information. The prior art generally uses a machine learning algorithm based on feature selection to analyze metabolomic feature data, which mainly comprises two parts: (1). Using feature selection to perform dimensionality reduction on the input data to clarify the important The characteristic signal and its corresponding metabolites, and eliminate the unrelated noise, thereby improving the performance of the prediction algorithm. Currently commonly used feature selection methods include Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), and Minimum Redundancy Maximum Association (Minimum). Redundancy Maximum Relevance, mRMR), etc. (2). Using the classification/regression algorithm to predict the dimensionality-reduced data, and estimate the physiological results that may be generated by the input features to guide the follow-up medical and scientific research. Currently used classification/regression algorithms include k-Nearest Neighbor (kN), Linear Regression, Logistic Regression, and Support Vector Machine (SVM). However, due to the large size of the metabolome, the feature dimension is high, contains a lot of noise, and the nonlinear relationship between the characteristic signal and the target state. The above conventional methods are often difficult to obtain satisfactory learning results within a reasonable computing time.

Feature Weighting is a generalized form of feature selection when a weight value can take any real value in the range [0, 1]. Compared with feature selection, feature weighting is more suitable for the analysis of metabolome feature data: First, existing research shows that feature weighting can obtain better predictive effect improvement ability than feature selection, and the formed system can target physiological The state is more accurately estimated. Secondly, the weighted weights are continuous values, which can more accurately describe the specific correlation between the corresponding metabolite signals and the target state. This information is of great value for subsequent related research. However, the metabolomic group feature data is large in scale and high in dimension, and its feature weighting is a complex large-scale multi-mode optimization problem, which is difficult to process using traditional mathematical methods. Therefore, its practical application is severely limited.

The main drawbacks of the existing machine learning algorithms for metabolomic characterization data are as follows: First, the weights in feature selection can only obtain two discrete values of {0, 1 }, but cannot make important differences in the importance of metabolite signals. A more precise description. For example, if two metabolites have an effect on the target physiological state, but the extent of the difference, the corresponding signal The weights should also vary. The metabolite signal weights that have a greater impact should also be larger, and vice versa. However, feature selection can only give 0 or 1 weights, and it is difficult to describe such differences. Lead to the loss of important biological information.

Second, the weighting algorithm in the feature weighting algorithm is difficult to set up, and there is currently no effective solution. Especially for feature weighting on big data, existing algorithms are difficult to effectively process, but only near. This seriously affects the performance of the analysis.

Third, existing machine learning techniques are designed primarily for small-scale data, without considering the big data of metabolome features. This often results in a significant decrease in the performance of the classification/return algorithm and an increase in the computation time index when faced with large data. In addition, the existing algorithms have high computational complexity and are difficult to parallelize on the architecture, which makes it impossible to effectively analyze metabolomic big data in a reasonable time.

Therefore, the prior art has yet to be improved and developed. Summary of the invention

In view of the above deficiencies of the prior art, the object of the present invention is to provide a metadata-based feature data analysis method and system for large data, aiming at solving the problem that the current data analysis method cannot quickly and effectively analyze metabolome big data.

The technical solution of the present invention is as follows:

A method for analyzing a metabolome characteristic data for big data, wherein the method comprises the following steps:

A. receiving the input metabolome characteristic data, dividing the data into multiple data blocks, and mapping the multiple data blocks into each operation section in the mapping protocol (MapReduce) framework Point

B. Optimizing weighted weights on multiple data blocks simultaneously using a computational intelligence method;

C. Combine the optimized weighted weights of the plurality of data blocks into weighted weights of the overall metabolome characteristic data and output.

The big data-oriented metabolome feature data analysis method, wherein the metabolome feature data is represented as a metabolome feature data set F = {FF ₂ , F _N }, wherein F„ =

[!, / ₂ , is the first feature vector, N is the data set size, D is the feature vector total dimension; the number of the multiple data blocks is and each data block contains =L»/A/ elements , set the total number of iterations of the system to f times.

The big data-oriented metabolome feature data analysis method, wherein the step A is specifically:

A1, read the initialization iteration counter k and judge the value of the reading. When = 0, construct the D-dimensional weight vector ^, and its value is initialized to a random value in the range of [0, 1]. When k> 0, The output weight of the last iteration is taken as the initial value of the current weight vector, ie W _k = W _k .

A2, construct a data block set containing an empty set IB =

= 0, ..., B _M = 0}, and the index vector containing all index values i) = [l, 2, 3, and initialize the number According to the block counter w = 0.

A3, constructor index vector / _∞ = 0, sub-weighted vector ^ = 0, and sub-feature vector set F _∞ = {F _m F _m , ₂ , F _m ^} , where any sub-feature vector has F _m , _n = 0, and initialize the in-block counter / = 0.

A4, randomly selecting an index value from the index vector 2), adding the sub-index vector / _∞ , and removing the index value ί from i), adding the weight vector w _{d of the} weight vector to the sub-weighting Vector W _{k takes} turns to obtain each feature vector in the metabolome feature data set F

F ", the value of the signal which is characteristic of dimension ^ 6 F _∞ is added to the first" sub-feature vector F _∞, ".

A5. Update the in-block counter /= / + 1 and judge whether / is less than, if yes, go to step A2, if no, go to step A6.

A6. Add the current data block to B _∞ = {I _m , W _k , _m , W _m } , and update the data block counter w = w + l. And determine whether w is less than M, and if so, then jump to step A1, and if not, execute step A7. node.

The big data-oriented metabolome feature data analysis method, wherein the step

A1 also includes: Initialize the iteration counter 0,

The big data-oriented metabolome feature data analysis method, wherein the step B is specifically:

Bl, for the data block B _∞ = {I _m , W _k , _m , W _m ), construct an evolutionary population of the computational intelligence method; ^, wherein the candidate solution of each of the optimized individuals is a dimension vector; ^, where, = 1 , 2, I , the value is initialized to = W _k , _m

B2. Set the maximum number of iterations of the computational intelligence method to initialize the iteration counter g =0;

B3. Calculate the fitness function value of each of the optimized individuals in the evolutionary population, and use the computational intelligence method to optimize the evolutionary population according to the fitness function values of each of the optimized individuals.

B4, update iteration counter = + 1, and determine whether g is less than if it is, then jump to step B3, if not, then perform step B5;

B 5. The candidate solution X _best of the optimal individual in the population is taken as the best sub-weighting vector obtained by optimization, that is, ^ =^ =argmin (^.)

Xijia ·

B6, the sub-weight vector and the sub-index vector / _∞ constitute a key-value pair P _M = <I _M .

W _KM >, as the output of the mapping process in the mapping specification framework. The big data-oriented metabolome feature data analysis method, wherein the computational intelligence method comprises differential evolution, particle swarm optimization or cultural genetic algorithm.

The big data-oriented metabolome feature data analysis method, wherein the step B3 calculates an evolutionary population; and the fitness function value of each of the optimized individuals in the 3⁄4 is specifically:

B31. For the first optimized individual, the candidate solution vector is used as the sub-weighting vector W _M

B32, multiplying each sub-feature vector in F _以 to perform weighting, when

If any weight in W _M is less than the preset threshold, the corresponding metabolic signature signal on this dimension is deleted, and the dimensionality reduction is implemented, and finally the weighted sub-feature vector is formed:

m* two

, n F m,n ®W m

Jf I, F m,n ,,WI, EW ,w I,>S] ) ·

B33. The weighted sub-feature vector set F* _∞ = [ _m ^ , _{2 is} used to train the machine learning classification/regression algorithm to obtain the prediction accuracy of the classification/regression algorithm;

B34. Using the prediction accuracy of the classification/regression algorithm as the current individual; Function value / ( ■).

The big data-oriented metabolome feature data analysis method, wherein the step

C is specifically:

C1, collecting all the key-value pairs of the output, forming a set of key-value pairs = {P ₁ P ₂ , ... P _M }, and subjecting them to a protocol;

C2. Construct a D-dimensional weight vector W _k = [0, 0, 0] of all zero values. Initialize the data block counter w = 0;

C3, obtaining the w-th key-value pair P _m = <I _m . W _k , _m > in the set of key-value pairs P, and initializing the intra-block counter /= 0;

C4, adding the weights on the dimension/vector in the sub-weight vector ^, _∞ to the dimension of the weight vector W _k , ie W _k = {w _d = W _Km [l] I d = I _m [l] ) =\ ..., ;

C5, update the in-block counter / = / + 1, determine / is less than, if yes, then go to step C4, if not, then go to step C6;

C6, update the data block counter w = w + 1 , determine whether w is less than, if yes, then go to step C3, if not, then perform step C7;

C7, update iteration counter Α = Α + 1, determine whether it is less than if it is, then jump to step A, if not, then perform step C8; C8. Perform a force right on the input metabolome feature data set F by using the finally obtained weight vector ^. The big data-oriented metabolome feature data analysis method, wherein the input metabolome feature data set F is weighted by using the finally obtained weight vector, and then used to train a machine learning algorithm to obtain an overall classification/regression prediction Accuracy, the weight vector ^ and the classification/regressive prediction accuracy are output as results. A metabolome-oriented feature data analysis system for big data, wherein the system comprises: a data segmentation module, configured to receive input metabolome feature data, divide the data into a plurality of data blocks, and divide the plurality of data blocks The mapping is sent to each of the computing nodes in the mapping specification framework;

The heuristic weighting module is configured to optimize the weighted weights on the plurality of data blocks divided by the data segmentation module by using the computational intelligence method;

The weight fusion module is configured to combine the optimized weighted weights of the plurality of data blocks into weighted weights of the overall metabolome feature data and output the weighted weights.

Advantageous Effects: The present invention provides a metadata-based feature data analysis method and system for big data, which is based on the characteristics of metabolomic feature big data. A parallel weighted analysis system for the MapReduce framework. On the one hand, the system's data blocking processing mechanism reduces the difficulty of weighted analysis and effectively improves the prediction accuracy. On the other hand, the parallelized structure of the system means that the system can be deployed to multiple compute nodes (such as multiple computers) for simultaneous processing, which can significantly reduce the overall computing time. In addition, the MapReduce framework can schedule, adjust, and balance each computing node to ensure system efficiency and stability. In addition, the computational intelligence algorithm applied in this system can effectively solve complex large-scale optimization problems. Better analysis results can be obtained by arching them into each heuristic weighting module. Its prediction accuracy is better than other existing feature weighting and feature selection algorithms, which can make a more effective estimation of the target physiological state. BRIEF DESCRIPTION OF THE DRAWINGS Method flow chart. The block diagram of the system.

Figure 3 is a schematic diagram showing the working principle of the big data-oriented metabolome characteristic data analysis system of the present invention.

FIG. 4 is a schematic diagram of a data segmentation process performed in step S100 of FIG. 1.

FIG. 5 is a schematic diagram of the process of weighting weight optimization of data blocks in step S200 of FIG. 1. FIG. 6 is a schematic diagram of a process of performing a protocol for optimizing weighted weights in step S300 of FIG. 1 . detailed description

The present invention provides a method for analyzing metabolomic characteristic data for large data and a system thereof, and the present invention will be further described in detail below in order to make the objects, technical solutions and effects of the present invention more clear and clear. It is understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

A large data-oriented metabolome feature data analysis method as shown in FIG. 1 , wherein the method comprises the following steps:

S100. Receive input metabolome feature data, divide the data into multiple data blocks, and map the plurality of data blocks into each operation node in the mapping protocol framework. Wherein, if the input metabolomic characteristic data is a metabolome characteristic data set F = {F

F ₂ , ..., F _N } , where ^ = [ !, / ₂ , is the first "feature vector, N is the data set size, D is the feature vector total dimension; the number of the plurality of data blocks is And each data block contains = L» /A/ elements, and the total number of system iterations is set to f times.

S200: Optimize weighted weights on multiple data blocks simultaneously by using a computational intelligence method.

S300: Combine the optimized weighted weights of the plurality of data blocks into weighted weights of the overall metabolome characteristic data and output the weighted weights.

Based on the above method, the present invention further provides a metadata-based feature data analysis system for big data, wherein the system is as shown in FIG. 2, and includes:

a data segmentation module 100, configured to receive the input metabolome feature data, and divide the segment It is a plurality of data blocks, and the multiple data block maps are sent into each operation node in the mapping specification framework.

The heuristic weighting module 200 is configured to optimize the weighting weights on the plurality of data blocks divided by the data segmentation module by using the computational intelligence method.

The weight fusion module 300 is configured to combine the optimized weighted weights of the plurality of data blocks into weighted weights of the overall metabolome feature data and output the weighted weights.

The working principle of the big data-oriented metabolome characteristic data analysis system of the present invention is as shown in FIG. 3:

51. Metabolome feature data input.

52. The data segmentation module divides the data. After being input to the data splitting module, the data is divided into data block B ₁ data block B ₂ data block B _M . A plurality of data block mappings are sent to each of the computing nodes in the mapping specification framework, that is, to the heuristic weighting module.

53. The heuristic weighting module optimizes the weighted weights. The data block weighted weights optimized by each heuristic weighting module are sent to the weight fusion module.

S4. The weight fusion module performs a specification on each optimized weighted weight.

55. Whether the iteration is completed, if not, returning to step S2, and if yes, executing step S6.

56. Output weight vector and classification/regressive prediction accuracy.

In the preferred embodiment, the data segmentation process in step S100 is as shown in FIG. 4, and the specific steps are as follows:

(1). Initialization iteration counter = 0. (2) . Read the initialization iteration counter and judge the value read. When = 0, construct a D-dimensional weight vector. , whose value is initialized to a random value in the range [0, 1]: Wo = [wi, w ₂ ,...-, = rand(0,l).

(3) When A> 0, the output of the previous iteration as the initial value of the weight of this weight vector, i.e., W _{_k} = W _k.

(4) . Construct a data block set containing an empty set IB =

B ₂ = 0, ..., B =

0}, and the index vector containing all index values i) = [l, 2, 3, and initialize the data block counter m →.

(5) . Constructor index vector / _∞ = 0, sub-weighted vector f^, _∞ = 0, and sub-feature vector set F _∞ = {F _m ^ F _m , F _m ^} , where any sub-feature vector has

F _∞ , „ = 0, and initialize the in-block counter / = 0.

(6) from the index vector 2) randomly selects a sub-index index vector Add / _∞ while ί removed from the index value i).

(7) Adding the weight of the weight vector in the ί dimension to the sub-weighting vector W _k , _m , taking each eigenvector F„ in the metabolome feature data set F in turn, and placing it in the first The first "sub-feature characteristic signal value F _d on the d-dimensional vector added _m F _m, _n.

(8) . Update the in-block counter / = / + 1 and judge if / is less than, if yes, skip to step (4), if no, go to step (9).

(9). Add the current data block to B _∞ = {I _m , W _k , _m , W _m } and update the data block counter w = w + l. And determining whether w is less than if it is, then going to step (3), and if not, executing step (10).

(10). The split data block set I map is sent to each operation node in the mapping specification framework. Common mapping protocol frameworks include Hadoop and Nokia Disco. Further, the step S200 performs a weighting weight optimization process on the data block as shown in FIG. 5:

(1) For the heuristic weighting module of the first parallel operation, the input data block is B _ra = {I _m ,

Wm}.

(2). Constructing an evolutionary population of computational intelligence methods; where the candidate solution for each of the optimized individuals is an L-dimensional vector·, where I = 1, 2, \ps\ , the value is initialized Is Xi= w _k , _m .

(3) . Set the calculation intelligence method to maximize the number of iterations to G, and initialize the iteration counter to =0.

(4) Calculate the fitness function value of each of the optimized individuals in the evolutionary population ps.

(5). Based on the fitness function values of each of the optimized individuals, the computational intelligence method is used to optimize the evolutionary population ps. Common algorithms include Differential Evolution (DE), Particle Swarm Optimization (PSO), and Memetic Algorithm (MA).

(6) Update the iteration counter = +1, and judge whether g is less than if it is, then go to step (4), if no, then go to step (7).

(7) After the optimization is completed, the candidate solver _δ££ ^ of the optimal individual in the population is taken as the best sub-weighting vector obtained by optimization.

_{_{_{W k, m = X best =}}} argmin / (z)

X _i e ps

(8). The sub-weighted vector and the sub-index vector / _∞ form a key-value pair P _m = <I _m .

W _Km >, as the output of the mapping process in the mapping specification framework. In a preferred embodiment, the step (4) further includes:

a), for the first searched individual, the candidate solution vector is used as the sub-weight W _m . B), will work with the F _∞ multiplied by the sub-feature vectors are weighted, if any of a weight value W _m of less than a preset threshold value corresponding to the characteristic signal metabolic / deleting on this dimension, dimension reduction realized, eventually forming weighting Sub-feature vector F* _∞ , „.

Two

m*,n F m,n ® W m

Jf I, e F m,n ,, w I, e W , w I, > S] ) c), the weighted sub-feature vector set F* _∞ = [ _m ^ , ₂ , used to train machine learning classification / Regression algorithm to obtain the prediction accuracy of the classification/regression algorithm. In the weighted analysis of metabolome feature data, algorithms such as support vector machine based on Kernel Methods and Extreme Learning Machine (ELM) are generally used.

d), the prediction accuracy of the classification/regression algorithm is taken as the fitness function value of the current individual Xi. For the classification algorithm, the accuracy rate is represented by the classification error rate; for the regression algorithm, the mean square error (Root) Mean Square Error, RMSE).

In the preferred embodiment, the step S300 performs a protocol processing process on the optimized weighting weights as shown in FIG. 6, which is specifically:

(1). Collect all M key-value pairs of the output to form a set of key-value pairs P = {PP P _M }, and it is subject to protocol processing.

(2) Construct a D-dimensional weight vector with all zeros = [0, 0, 0]. Initialize the data block counter w = 0.

(3). Get the wth key-value pair P _m = <I _m . W _k , _m > in the set of key-value pairs p, and initialize the in-block counter / = 0.

(4) Add the weights in the dimension/vector in the sub-weight vector ^, _∞ to the dimension of the weight vector W _k , ie W _k = {w _d = W _k , _m [l] I d = I _m [l]}, l=\, 2, .

(5) . Update the in-block counter / = /+ 1, judge / is less than, if yes, go to step (4), if no, go to step (6).

(6). Update the data block counter w = w + 1 to determine if w is less than If yes, go to step (3), if no, go to step (7).

(7). Update iteration counter A = A+ 1, judge whether it is less than if it is, then jump to the subdivision step (2) of step S100, and if not, execute step (8).

(8). Using the weight vector ^ obtained finally, the input metabolome feature data set F is used. In addition, the input metabolome feature data set F is weighted using the resulting weight vector. It is then used to train machine learning algorithms to obtain overall classification/regression. The accuracy is predicted, and the process is as shown in steps b)-d) of the subdivision step (4) of step S200, and finally the weight vector and the classification/regressive prediction accuracy are output as results. Compared with the prior art, the system of the invention has the following advantages:

First, the system is a parallel weighted analysis system based on the mapping protocol framework for the characteristics of metabolonomic feature big data. On the one hand, data block processing reduces the difficulty of weighted analysis and effectively improves the prediction accuracy. Parallelized architecture, on the other hand, means that the system can be deployed to multiple compute nodes (such as multiple computers) for simultaneous processing, significantly reducing overall computation time. In addition, the mapping protocol framework can schedule, adjust, and balance each computing node to ensure system efficiency and stability.

Second, computational intelligence algorithms can effectively solve complex large-scale optimization problems. By introducing it into each heuristic weighting module, it is used to optimize the sub-weighted vector for better analysis results. The experimental data shows that the weighting design method based on computational intelligence has better prediction accuracy than other existing feature weighting and feature selection algorithms. A more effective estimate of the target's physiological state can be used to better guide subsequent biological and medical applications.

Third, optimize the weight values in the obtained weight vector, and specifically describe the degree of correlation between the corresponding metabolite signal and the metabolites it represents, and the predicted physiological state of the target. This information is important for subsequent research and can help clarify the underlying mechanisms of the metabolic process in living organisms.

It is to be understood that the application of the present invention is not limited to the above-described examples, and those skilled in the art can make modifications and changes in accordance with the above description. All such modifications and changes are intended to fall within the scope of the appended claims.

Claims

Claim

A method for analyzing a metabolome characteristic data for big data, characterized in that the method comprises the following steps:

A. receiving the input metabolome feature data, dividing the data into a plurality of data blocks, and mapping the plurality of data blocks into each operation node in the mapping specification framework;

2. The big data-oriented metabolome feature data analysis method according to claim 1, wherein the metabolome feature data is represented as a metabolome feature data set F =

{FF ₂ , ..., F _N }, where ^ = [ !, / ₂ , ..., _ D] is the first "feature vector, N is the data set size, and D is the total dimension of the feature vector; The number of data blocks is one and each data block contains = L»/A/ elements, and the total number of system iterations is set to f times.

The big data-oriented metabolome feature data analysis method according to claim 2, wherein the step A is specifically:

A1, read the initialization iteration counter k and judge the value of the reading. When = 0, construct a D-dimensional weight vector ^, whose value is initialized to a random value in the range [0, 1], when > 0, it will be The output weight of one iteration is taken as the initial value of the current weight vector, ie W _k = W _kA ; 0}, and the index vector containing all index values i) = [l, 2, 3, and initialize the data block counter w = 0;

A3, constructor index vector / _∞ = 0, sub-weighted vector ^ = 0, and sub-feature vector set F _∞ = {F _m F _m , ₂ , F _m ^} , where any sub-feature vector has F _m , _n = 0, and initialize the in-block counter / = 0;

F„, the feature signal value ^ in the sixth dimension is added to the first "sub-feature vector of F _∞

A5, update the in-block counter / = / + 1, and determine / is less than, if yes, then go to step A2, if not, then step A6; A6. Add the current data block to B _∞ = {I _m , W _k , _m , _m }, and update the data block counter w = w + l. And determining whether w is less than if yes, then jumping to step A1, if not, performing step Α7;

Α 7. The divided data block set IB mapping is sent to each operation node in the mapping specification framework.

4. The big data-oriented metabolome feature data analysis method according to claim 3, wherein the step A1 further comprises: initializing an iteration counter =0.

The big data-based metabolome feature data analysis method according to claim 4, wherein the step B is specifically:

Bl, constructing an evolutionary population of the computational intelligence method for the data block B _∞ = {I _m , W _k , _m , _m }, ^, wherein the candidate solution of each of the optimized individuals is a dimension vector; ^, where, = 1, 2, I , the value is initialized to = W _k , _m

B3, calculating a fitness function value of each of the optimized individuals in the evolved population, and using the computational intelligence method to optimize the evolutionary population ps according to the fitness function values of each of the optimized individuals; B4, update iteration counter = + 1 , and determine whether g is less than if yes, then jump to step B3, if not, then perform step B5;

B 5. The candidate solution X _best of the optimal individual in the population is taken as the best sub-weighting vector obtained by optimization, that is,

= = arg mm

X _t e ps

W _Km >, as the output of the mapping process in the mapping specification framework.

6. The big data-oriented metabolome feature data analysis method according to claim 5, wherein the computational intelligence method comprises differential evolution, particle swarm optimization or cultural genetic algorithm.

7. The big data-oriented metabolome feature data analysis method according to claim 6, wherein the evolutionary population is calculated in the step B3; the fitness function value of each of the optimized individuals in the 3⁄4 is specifically:

B31. For the first searched individual, the candidate solution vector is used as the sub-weight vector W _m

B32, multiplying each sub-feature vector in F _以 to perform weighting, when

Any weight W/ less than the preset threshold value in W _m deletes the corresponding metabolic feature signal in this dimension, realizes dimensionality reduction, and finally forms a weighted sub-feature vector; B33. The weighted sub-feature vector set F* _∞ = [ _m ^ f _m ' _{2 is} used to train the machine learning classification/regression algorithm to obtain the prediction accuracy of the classification/regression algorithm;

B34. The prediction accuracy of the classification/regression algorithm is taken as the current individual; the fitness value of ^ is the function value /(■).

The big data-based metabolome feature data analysis method according to claim 7, wherein the step C is specifically:

C1, collect all the key-value pairs of the output, constitute a set of key-value pairs = {P ₁ P ₂ , ...

P _M }, and subject to its specification;

C4, adding the weights in the dimension/vector in the sub-weight vector ^, _∞ to the dimension of the weight vector W _k , ie W _k = {w _d = W _m [l] I d = I _m [l] },l= \, 2, C5, update the in-block counter / = / + 1 , determine / is less than, if yes, then go to step C4, and if not, proceed to step C6;

C6, update the data block counter w = w + 1 , determine whether w is less than if it is, then jump to step C3, if not, then perform step C7;

C7, update iteration counter Α = Α + 1 , determine whether it is less than if it is, then jump to step A, if not, then perform step C8;

C8. Using the weight vector obtained finally, the input metabolome feature data set F is weighted.

9. The big data-oriented metabolome feature data analysis method according to claim 8, wherein the input metabolome feature data set IF is weighted by using the finally obtained weight vector, and then used to train a machine learning algorithm. , get the overall classification

/Regression prediction accuracy, and the weight vector ^ and the classification/regressive prediction accuracy are output as results.

10. A metabolome-oriented feature data analysis system for big data, the system comprising: a data segmentation module, configured to receive input metabolome feature data, divide the data into a plurality of data blocks, and Multiple data block maps are fed into individual arithmetic sections in the mapping specification framework Point