CN108681697B - Feature selection method and device - Google Patents

Feature selection method and device Download PDF

Info

Publication number
CN108681697B
CN108681697B CN201810401774.3A CN201810401774A CN108681697B CN 108681697 B CN108681697 B CN 108681697B CN 201810401774 A CN201810401774 A CN 201810401774A CN 108681697 B CN108681697 B CN 108681697B
Authority
CN
China
Prior art keywords
feature
sample
quantitative analysis
analysis model
partial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810401774.3A
Other languages
Chinese (zh)
Other versions
CN108681697A (en
Inventor
罗娜
韩平
王冬
王世芳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Academy of Agriculture and Forestry Sciences
Original Assignee
Beijing Research Center For Agricultural Standards and Testing
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Research Center For Agricultural Standards and Testing filed Critical Beijing Research Center For Agricultural Standards and Testing
Priority to CN201810401774.3A priority Critical patent/CN108681697B/en
Publication of CN108681697A publication Critical patent/CN108681697A/en
Application granted granted Critical
Publication of CN108681697B publication Critical patent/CN108681697B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/08Feature extraction

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Apparatus Associated With Microorganisms And Enzymes (AREA)
  • Investigating Or Analysing Materials By Optical Means (AREA)

Abstract

The embodiment of the invention discloses a feature selection method and a feature selection device, which can realize the selection of the features of target object determination in spectrum nondestructive detection and have better robustness and stability. The method comprises the following steps: s1, acquiring a spectrum data set of the sample; s2, sampling the spectrum data set for a first number of times to obtain a first number of sample spaces, constructing a partial least square quantitative analysis model for each sample space by using the sample space, and performing importance ranking on the characteristics corresponding to the sample spaces based on the partial least square quantitative analysis model; s3, sorting the features corresponding to the first number of sample spaces according to the importance sorting result of the features corresponding to the first number of sample spaces to obtain a feature sorting result, determining the feature selection number based on the feature sorting result, and selecting the features with the feature selection number before as target features according to the feature sorting result.

Description

Feature selection method and device
Technical Field
The embodiment of the invention relates to the field of spectral analysis, in particular to a feature selection method and device.
Background
The spectrum-based nondestructive testing technology has the advantages of simplicity in operation, rapidness, no damage, no need of pretreatment and auxiliary reagents and the like, and is widely applied to the fields of agriculture, chemical industry, environment, medicine and the like. However, the obtained spectral data usually has the characteristics of a large number of characteristic wavelengths, a small number of samples, collinearity existing between characteristics, and the like, so that the effect of directly utilizing the overall characteristics to establish a model is not good enough. The existing research shows that the model is established after the wavelength is selected by the characteristic selection method, and the method has the advantages of improving the precision of the model, simplifying the complexity of the model, accelerating the operation speed of the model, enhancing the interpretability of the model and the like.
Common feature selection methods in the field of spectral analysis include: the method includes a non-information variable elimination method, a competitive adaptive weight method, a continuous projection algorithm, an interval partial least square method, a feature selection method based on optimization (genetic algorithm, simulated annealing, particle swarm, and the like), and the like. These methods are all single-feature selection methods, and when local small variation occurs in the data set, the variation of the selected features is large, and the stability is poor.
Disclosure of Invention
In order to overcome the defects and drawbacks of the prior art, embodiments of the present invention provide a feature selection method and apparatus.
In one aspect, an embodiment of the present invention provides a feature selection method, including:
s1, acquiring a spectral data set of the sample, wherein the spectral data set of the sample comprises the spectral data of a specific number of samples and the content of the target object in the sample;
s2, sampling the spectrum data set for a first number of times to obtain a first number of sample spaces, constructing a partial least square quantitative analysis model for each sample space by using the sample space, and performing importance ranking on the characteristics corresponding to the sample spaces based on the partial least square quantitative analysis model;
s3, sorting the features corresponding to the first number of sample spaces according to the importance sorting result of the features corresponding to the first number of sample spaces to obtain a feature sorting result, determining the feature selection number based on the feature sorting result, and selecting the features with the feature selection number before as target features according to the feature sorting result.
In another aspect, an embodiment of the present invention provides a feature selection apparatus, including:
an acquisition unit, configured to acquire a spectral dataset of a sample, where the spectral dataset of the sample includes spectral data of a specific number of samples and a content of a target in the sample;
the sequencing unit is used for sampling the spectrum data set for a first number of times to obtain a first number of sample spaces, constructing a partial least square quantitative analysis model for each sample space by using the sample space, and sequencing the importance of the characteristics corresponding to the sample space based on the partial least square quantitative analysis model;
and the selecting unit is used for sorting the features corresponding to the first number of sample spaces according to the importance sorting result of the features corresponding to the first number of sample spaces to obtain a feature sorting result, determining the feature selection number based on the feature sorting result, and selecting the features with the feature selection number before as the target features according to the feature sorting result.
In a third aspect, an embodiment of the present invention provides an electronic device, including: a processor, a memory, a bus, and a computer program stored on the memory and executable on the processor;
the processor and the memory complete mutual communication through the bus;
the processor, when executing the computer program, implements the method described above.
In a fourth aspect, an embodiment of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the above method.
The method and the device for selecting the characteristics provided by the embodiment of the invention comprise the steps of firstly, acquiring a spectral data set of a sample; then sampling the spectrum data set for a first number of times to obtain a first number of sample spaces, constructing a partial least square quantitative analysis model for each sample space by using the sample space, and sequencing the importance of the characteristics corresponding to the sample space based on the partial least square quantitative analysis model; finally, sorting the features corresponding to the first number of sample spaces according to the importance sorting result of the features corresponding to the first number of sample spaces to obtain a feature sorting result, determining the feature selection number based on the feature sorting result, and selecting a number of features as target features according to the feature sorting result, compared with a single feature selection method, the scheme has the advantages of small change of the selected features and better stability when the data set has local small and variable changes, the method can realize the selection of the characteristics of the target object determination in the spectrum nondestructive detection, has better robustness and stability, in addition, the partial least square method is used as a quantitative analysis model, experiments show that the accuracy of the model constructed by using the method for feature selection is superior to the accuracy of the model constructed by using all features on an experimental data set.
Drawings
FIG. 1 is a schematic flow chart diagram illustrating a feature selection method according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of an embodiment of a feature selection apparatus according to the present invention;
fig. 3 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments, but not all embodiments, of the present invention. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without any creative effort belong to the protection scope of the embodiments of the present invention.
Referring to fig. 1, the present embodiment discloses a feature selection method, including:
s1, acquiring a spectral data set of the sample, wherein the spectral data set of the sample comprises the spectral data of a specific number of samples and the content of the target object in the sample;
s2, sampling the spectrum data set for a first number of times to obtain a first number of sample spaces, constructing a partial least squares quantitative analysis model for each sample space by using the sample space, and ranking the importance of the characteristics (namely the wavelength) corresponding to the sample space based on the partial least squares quantitative analysis model;
in this embodiment, a partial least squares quantitative analysis model is constructed by using the sample space, and the specific process is as follows: selecting a part of samples in the sample space as a training set to train the partial least square quantitative analysis model, selecting the rest samples in the sample space as a testing set to test the partial least square quantitative analysis model obtained by training, and calculating the root mean square error of the partial least square quantitative analysis model based on the testing result; and repeating the process until the root-mean-square error of the obtained partial least square quantitative analysis model is within a preset range, wherein the partial least square quantitative analysis model corresponding to the root-mean-square error is the partial least square quantitative analysis model to be constructed.
S3, sorting the features corresponding to the first number of sample spaces according to the importance sorting result of the features corresponding to the first number of sample spaces to obtain a feature sorting result, determining the feature selection number based on the feature sorting result, and selecting the features with the feature selection number before as target features according to the feature sorting result.
The characteristic selection method provided by the embodiment of the invention comprises the steps of firstly, acquiring a spectral data set of a sample; then sampling the spectrum data set for a first number of times to obtain a first number of sample spaces, constructing a partial least square quantitative analysis model for each sample space by using the sample space, and sequencing the importance of the characteristics corresponding to the sample space based on the partial least square quantitative analysis model; finally, sorting the features corresponding to the first number of sample spaces according to the importance sorting result of the features corresponding to the first number of sample spaces to obtain a feature sorting result, determining the feature selection number based on the feature sorting result, and selecting a number of features as target features according to the feature sorting result, compared with a single feature selection method, the scheme has the advantages of small change of the selected features and better stability when the data set has local small and variable changes, the method can realize the selection of the characteristics of the target object determination in the spectrum nondestructive detection, has better robustness and stability, in addition, the partial least square method is used as a quantitative analysis model, experiments show that the accuracy of the model constructed by using the method for feature selection is superior to the accuracy of the model constructed by using all features on an experimental data set.
On the basis of the foregoing method embodiment, the sampling may use a bootstrap resampling method, each sampling is a put-back sampling, and the capacity of each sample space is the same as the capacity of the spectral data set.
On the basis of the foregoing method embodiment, the ranking of importance of the features corresponding to the sample space based on the partial least squares quantitative analysis model may include:
and obtaining a coefficient corresponding to each feature in the partial least square quantitative analysis model, and sequencing the absolute values of the coefficients from large to small to obtain a feature sequencing vector in the sample space.
In this embodiment, the feature sorting vector in the sample space is composed of the importance characterizing quantities of the features in the sample space, and the importance characterizing quantity of each feature is an absolute value of a coefficient corresponding to the feature.
On the basis of the foregoing method embodiment, the sorting the features corresponding to the first number of sample spaces according to the importance sorting result of the features corresponding to the first number of sample spaces to obtain a feature sorting result may include:
and integrating the feature sorting vectors under each sample space by using a linear weighted summation method to obtain the feature sorting result.
In this embodiment, the specific process of integrating the feature sorting vectors in each sample space by using the linear weighted sum method to obtain the feature sorting result is as follows: for each feature, the corresponding values of the feature in all feature rank vectors are subjected to weighted summation, wherein the weight of the corresponding value of the feature in each feature rank vector is based on the feature rank vectorThe corresponding partial least square quantitative analysis model determined by the sample space is obtained, and the specific calculation process is as follows: firstly, respectively utilizing each sample space to construct a partial least square quantitative analysis model, calculating the root mean square error value of the partial least square quantitative analysis model obtained by constructing each sample space, and recording the root mean square error value of the partial least square quantitative analysis model obtained by constructing the ith sample space as EiSmaller values indicate higher model accuracy; next, calculate EiIs the reciprocal of (a), is marked as Oi(ii) a Thirdly, for these OiSumming to obtain SUM; finally, each O isiDivide by SUM to obtain the weight WiThat is, the weight of the corresponding value in the feature sorting vector of each feature in the ith sample space; and sequencing the features according to the size of the summation result corresponding to each feature obtained in the previous step to obtain the feature sequencing result, wherein the feature with the larger summation result in the feature sequencing result is arranged at the position closer to the front.
On the basis of the foregoing method embodiment, the determining the number of feature choices based on the feature sorting result may include:
selecting the set value characteristics on the spectrum data set according to the characteristic sorting result by setting the characteristic selection quantity to different values and based on the value set each time, and constructing a partial least square quantitative analysis model based on the characteristic selection result;
and selecting the feature selection quantity corresponding to the least square quantitative analysis model with the minimum root mean square error in each least square quantitative analysis model obtained in the previous step as a target value.
In this embodiment, a cross validation method may be adopted to select the target value.
On the basis of the foregoing method embodiment, each two adjacent values of the different values may differ by 1.
In this embodiment, the number of different values may be the same as the capacity of the spectral data set.
Referring to fig. 2, the present embodiment discloses a feature selection apparatus, including:
an acquisition unit 1 configured to acquire a spectral dataset of a sample, wherein the spectral dataset of the sample includes spectral data of a specific number of samples and a content of a target in the sample;
the sorting unit 2 is configured to perform a first number of times of sampling on the spectral data set to obtain a first number of sample spaces, construct a partial least squares quantitative analysis model for each sample space by using the sample space, and perform importance sorting on features corresponding to the sample space based on the partial least squares quantitative analysis model;
and the selecting unit 3 is configured to sort the features corresponding to the first number of sample spaces according to the importance sorting result of the features corresponding to the first number of sample spaces to obtain a feature sorting result, determine a feature selection number based on the feature sorting result, and select the features of the previous feature selection number as target features according to the feature sorting result.
Specifically, the acquiring unit 1 acquires a spectral data set of a sample, wherein the spectral data set of the sample includes spectral data of a specific number of samples and a content of a target in the sample; the sequencing unit 2 performs a first number of times of sampling on the spectrum data set to obtain a first number of sample spaces, constructs a partial least square quantitative analysis model for each sample space by using the sample space, and performs importance sequencing on the characteristics corresponding to the sample space based on the partial least square quantitative analysis model; the selecting unit 3 ranks the features corresponding to the first number of sample spaces according to the importance ranking result of the features corresponding to the first number of sample spaces to obtain a feature ranking result, determines a feature selection number based on the feature ranking result, and selects the features of the previous feature selection number as target features according to the feature ranking result.
The characteristic selection device provided by the embodiment of the invention firstly obtains a spectrum data set of a sample; then sampling the spectrum data set for a first number of times to obtain a first number of sample spaces, constructing a partial least square quantitative analysis model for each sample space by using the sample space, and sequencing the importance of the characteristics corresponding to the sample space based on the partial least square quantitative analysis model; finally, sorting the features corresponding to the first number of sample spaces according to the importance sorting result of the features corresponding to the first number of sample spaces to obtain a feature sorting result, determining the feature selection number based on the feature sorting result, and selecting a number of features as target features according to the feature sorting result, compared with a single feature selection method, the scheme has the advantages of small change of the selected features and better stability when the data set has local small and variable changes, the method can realize the selection of the characteristics of the target object determination in the spectrum nondestructive detection, has better robustness and stability, in addition, the partial least square method is used as a quantitative analysis model, experiments show that the accuracy of the model constructed by using the method for feature selection is superior to the accuracy of the model constructed by using all features on an experimental data set.
On the basis of the foregoing device embodiment, the selecting unit may be specifically configured to:
selecting the set value characteristics on the spectrum data set according to the characteristic sorting result by setting the characteristic selection quantity to different values and based on the value set each time, and constructing a partial least square quantitative analysis model based on the characteristic selection result;
and selecting the feature selection quantity corresponding to the least square quantitative analysis model with the minimum root mean square error in each least square quantitative analysis model obtained in the previous step as a target value.
The feature selection apparatus of this embodiment may be configured to execute the technical solutions of the foregoing method embodiments, and the implementation principles and technical effects thereof are similar, and are not described herein again.
The invention has the advantages that: the method is based on partial least square algorithm, has high convergence speed, high stability and high practical value, can be used for paralleling, and is superior to full-band prediction effect on the basis of the selected wavelength establishment model.
Fig. 3 is a schematic entity structure diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 3, the electronic device may include: a processor 11, a memory 12, a bus 13, and a computer program stored on the memory 12 and executable on the processor 11;
the processor 11 and the memory 12 complete mutual communication through the bus 13;
when the processor 11 executes the computer program, the method provided by the foregoing method embodiments is implemented, for example, including: acquiring a spectral dataset of a sample; sampling the spectrum data set for a first number of times to obtain a first number of sample spaces, constructing a partial least square quantitative analysis model for each sample space by using the sample space, and sequencing the importance of the characteristics corresponding to the sample space based on the partial least square quantitative analysis model; and sorting the features corresponding to the first number of sample spaces according to the importance sorting result of the features corresponding to the first number of sample spaces to obtain a feature sorting result, determining the feature selection number based on the feature sorting result, and selecting the features with the feature selection number before the feature selection number is selected as target features according to the feature sorting result.
An embodiment of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method provided by the foregoing method embodiments, and for example, the method includes: acquiring a spectral dataset of a sample; sampling the spectrum data set for a first number of times to obtain a first number of sample spaces, constructing a partial least square quantitative analysis model for each sample space by using the sample space, and sequencing the importance of the characteristics corresponding to the sample space based on the partial least square quantitative analysis model; and sorting the features corresponding to the first number of sample spaces according to the importance sorting result of the features corresponding to the first number of sample spaces to obtain a feature sorting result, determining the feature selection number based on the feature sorting result, and selecting the features with the feature selection number before the feature selection number is selected as target features according to the feature sorting result.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element. The terms "upper", "lower", and the like, indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience in describing the present invention and simplifying the description, but do not indicate or imply that the referred devices or elements must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention. Unless expressly stated or limited otherwise, the terms "mounted," "connected," and "connected" are intended to be inclusive and mean, for example, that they may be fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
In the description of the present invention, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description. Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present invention is not limited to any single aspect, nor is it limited to any single embodiment, nor is it limited to any combination and/or permutation of these aspects and/or embodiments. Moreover, each aspect and/or embodiment of the present invention may be utilized alone or in combination with one or more other aspects and/or embodiments thereof.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present invention, and they should be construed as being included in the following claims and description.

Claims (8)

1. A method of feature selection, comprising:
s1, acquiring a spectral data set of the sample, wherein the spectral data set of the sample comprises the spectral data of a specific number of samples and the content of the target object in the sample;
s2, sampling the spectrum data set for a first number of times to obtain a first number of sample spaces, constructing a partial least square quantitative analysis model for each sample space by using the sample space, and performing importance ranking on the characteristics corresponding to the sample spaces based on the partial least square quantitative analysis model;
s3, sorting the features corresponding to the first number of sample spaces according to the importance sorting result of the features corresponding to the first number of sample spaces to obtain a feature sorting result, determining the feature selection number based on the feature sorting result, and selecting the features with the number of feature selection numbers before the feature selection number as target features according to the feature sorting result;
the ranking of importance of the features corresponding to the sample space based on the partial least squares quantitative analysis model includes:
obtaining a coefficient corresponding to each feature in the partial least square quantitative analysis model, and sequencing absolute values of the coefficients from large to small to obtain a feature sequencing vector in the sample space;
the sorting the features corresponding to the first number of sample spaces according to the importance sorting result of the features corresponding to the first number of sample spaces to obtain a feature sorting result includes:
integrating feature sorting vectors under each sample space by using a linear weighted summation method to obtain a feature sorting result;
the weight of the corresponding value of the feature in each feature sorting vector is obtained based on a partial least squares quantitative analysis model determined by a sample space corresponding to the feature sorting vector, and the specific calculation process is as follows: firstly, respectively utilizing each sample space to construct a partial least square quantitative analysis model, calculating the root mean square error value of the partial least square quantitative analysis model constructed by each sample space, and calculating the partial least square quantitative analysis model constructed by the ith sample spaceThe root mean square error value of the model is recorded as EiSmaller values indicate higher model accuracy; next, calculate EiIs the reciprocal of (a), is marked as Oi(ii) a Thirdly, for these OiSumming to obtain SUM; finally, each O isiDivide by SUM to obtain the weight WiI.e. the weight of the corresponding value in the feature ordering vector of each feature in the ith sample space.
2. The method of claim 1, wherein the sampling uses a bootstrap resampling method, each sampling being a put-back sampling, the volume of each sample space being the same as the volume of the spectral data set.
3. The method of claim 1, wherein determining the number of feature choices based on the feature ranking result comprises:
selecting the set value characteristics on the spectrum data set according to the characteristic sorting result by setting the characteristic selection quantity to different values and based on the value set each time, and constructing a partial least square quantitative analysis model based on the characteristic selection result;
and selecting the feature selection quantity corresponding to the least square quantitative analysis model with the minimum root mean square error in each least square quantitative analysis model obtained in the previous step as a target value.
4. A method according to claim 3, characterized in that every two adjacent values of said different values differ by 1.
5. A feature selection apparatus, comprising:
an acquisition unit, configured to acquire a spectral dataset of a sample, where the spectral dataset of the sample includes spectral data of a specific number of samples and a content of a target in the sample;
the sequencing unit is used for sampling the spectrum data set for a first number of times to obtain a first number of sample spaces, constructing a partial least square quantitative analysis model for each sample space by using the sample space, and sequencing the importance of the characteristics corresponding to the sample space based on the partial least square quantitative analysis model;
the selecting unit is used for sorting the features corresponding to the first number of sample spaces according to the importance sorting result of the features corresponding to the first number of sample spaces to obtain a feature sorting result, determining the feature selection number based on the feature sorting result, and selecting the features with the number of feature selection numbers before the feature selection number is selected as target features according to the feature sorting result;
the ranking of importance of the features corresponding to the sample space based on the partial least squares quantitative analysis model includes:
obtaining a coefficient corresponding to each feature in the partial least square quantitative analysis model, and sequencing absolute values of the coefficients from large to small to obtain a feature sequencing vector in the sample space;
the sorting the features corresponding to the first number of sample spaces according to the importance sorting result of the features corresponding to the first number of sample spaces to obtain a feature sorting result includes:
and integrating the feature sorting vectors under each sample space by using a linear weighted summation method to obtain the feature sorting result.
6. The apparatus according to claim 5, wherein the selection unit is specifically configured to:
selecting the set value characteristics on the spectrum data set according to the characteristic sorting result by setting the characteristic selection quantity to different values and based on the value set each time, and constructing a partial least square quantitative analysis model based on the characteristic selection result;
and selecting the feature selection quantity corresponding to the least square quantitative analysis model with the minimum root mean square error in each least square quantitative analysis model obtained in the previous step as a target value.
7. An electronic device, comprising: a processor, a memory, a bus, and a computer program stored on the memory and executable on the processor;
the processor and the memory complete mutual communication through the bus;
the processor, when executing the computer program, implements the method of any of claims 1-4.
8. A non-transitory computer-readable storage medium, characterized in that the storage medium has stored thereon a computer program which, when executed by a processor, implements the method of any one of claims 1-4.
CN201810401774.3A 2018-04-28 2018-04-28 Feature selection method and device Active CN108681697B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810401774.3A CN108681697B (en) 2018-04-28 2018-04-28 Feature selection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810401774.3A CN108681697B (en) 2018-04-28 2018-04-28 Feature selection method and device

Publications (2)

Publication Number Publication Date
CN108681697A CN108681697A (en) 2018-10-19
CN108681697B true CN108681697B (en) 2021-03-23

Family

ID=63802758

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810401774.3A Active CN108681697B (en) 2018-04-28 2018-04-28 Feature selection method and device

Country Status (1)

Country Link
CN (1) CN108681697B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109856060A (en) * 2019-03-15 2019-06-07 首都师范大学 The detection method and system of synthetic dyestuff concentration in assembled alcoholic drinks
CN111175243B (en) * 2019-12-31 2023-03-31 汉谷云智(武汉)科技有限公司 Method and system for quickly selecting spectral interval
CN112819062B (en) * 2021-01-26 2022-05-17 淮阴工学院 Fluorescence spectrum secondary feature selection method based on mixed particle swarm and continuous projection
CN113567375B (en) * 2021-07-29 2022-05-10 中南大学 Self-adaptive multi-metal ion concentration regression prediction method and system based on linear feature separation
CN113919510A (en) * 2021-11-01 2022-01-11 上海勃池信息技术有限公司 Sample feature selection method, device, equipment and medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103472013A (en) * 2013-06-09 2013-12-25 温州大学 Visible-near infrared spectrum PLS-DA modeling method combining Adaboost algorithm
CN103528990A (en) * 2013-10-31 2014-01-22 天津工业大学 Method for establishing multiple models of near infrared spectrums
CN105630743A (en) * 2015-12-24 2016-06-01 浙江大学 Spectrum wave number selection method
CN106644983A (en) * 2016-12-28 2017-05-10 浙江大学 Spectrum wavelength selection method based on PLS-VIP-ACO algorithm
CN106770007A (en) * 2016-11-29 2017-05-31 中国石油大学(华东) A kind of characteristic wavelength of near-infrared spectrum system of selection for least square method supporting vector machine model
WO2017201924A1 (en) * 2016-05-27 2017-11-30 福建师范大学 Detection and analysis method for urine-modified nucleoside based on surface enhanced resonant raman spectrum

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103472013A (en) * 2013-06-09 2013-12-25 温州大学 Visible-near infrared spectrum PLS-DA modeling method combining Adaboost algorithm
CN103528990A (en) * 2013-10-31 2014-01-22 天津工业大学 Method for establishing multiple models of near infrared spectrums
CN105630743A (en) * 2015-12-24 2016-06-01 浙江大学 Spectrum wave number selection method
WO2017201924A1 (en) * 2016-05-27 2017-11-30 福建师范大学 Detection and analysis method for urine-modified nucleoside based on surface enhanced resonant raman spectrum
CN106770007A (en) * 2016-11-29 2017-05-31 中国石油大学(华东) A kind of characteristic wavelength of near-infrared spectrum system of selection for least square method supporting vector machine model
CN106644983A (en) * 2016-12-28 2017-05-10 浙江大学 Spectrum wavelength selection method based on PLS-VIP-ACO algorithm

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
徐秋.基于光谱特征的自适应子空间波段选择方法.《北京航空航天大学学报》.2017, *
戴琼海.光谱数据的特征挖掘降维方法.《数据采集与处理》.2017, *
杨艺.基于排序融合的特征选择.《控制与决策》.2011, *
马淏.光谱及高光谱成像技术在作物特征信息提取中的应用研究.《中国博士学位论文全文数据库》.2015, *

Also Published As

Publication number Publication date
CN108681697A (en) 2018-10-19

Similar Documents

Publication Publication Date Title
CN108681697B (en) Feature selection method and device
CN107742061A (en) A kind of prediction of protein-protein interaction mthods, systems and devices
CN109859113B (en) Model generation method, image enhancement method, device and computer-readable storage medium
CN109411016B (en) Gene variation site detection method, device, equipment and storage medium
Debastiani et al. Evolutionary models and phylogenetic signal assessment via Mantel test
CN112289370B (en) Protein structure prediction method and device
CN111859743A (en) Structure dynamic displacement estimation method and device based on acceleration response
CN113066527B (en) Target prediction method and system for siRNA knockdown mRNA
CN107766695B (en) A kind of method and device obtaining peripheral blood genetic model training data
US9400868B2 (en) Method computer program and system to analyze mass spectra
CN117434429B (en) Chip stability testing method and related device
Tinoco Modeling elastic and geometric properties of Coffea arabica L. var. Colombia fruits by an experimental-numerical approach
CN110751400B (en) Risk assessment method and device
CN114565092A (en) Neural network structure determining method and device
CN111091865B (en) Method, device, equipment and storage medium for generating MoRFs prediction model
CN113138556B (en) High-precision closed-loop system identification method
CN115456023A (en) Nuclear magnetic resonance relaxation spectrum resolution method and device, computer equipment and storage medium
JP7424509B2 (en) Learning device, identification device, learning method, identification method, learning program, and identification program
CN112881450B (en) Quantitative analysis model construction and quantitative analysis method and system for tissue components
CN108733973B (en) Automatic and efficient DFTB repulsive potential fitting method
CN106295026B (en) Traffic similarity analysis method and device
CN113449922A (en) Bank branch site selection method and device
CN116956990A (en) Model quantization method, device, equipment and storage medium
CN115469136A (en) Non-invasive three-phase voltage measuring method based on electric field sensor array
CN116542420A (en) Method and device for identifying structural real mode based on improved ERA algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220808

Address after: 100097 No. 9 middle garden, Shuguang garden, Beijing, Haidian District

Patentee after: BEIJING ACADEMY OF AGRICULTURE AND FORESTRY SCIENCES

Address before: 1011, seed building, Beijing Academy of agriculture and Forestry Sciences, No. 9, Shuguang garden middle road, Haidian District, Beijing 100097

Patentee before: BEIJING RESEARCH CENTER FOR AGRICULTURAL STANDARDS AND TESTING