CN110222087B - Feature extraction method, device and computer readable storage medium - Google Patents

Feature extraction method, device and computer readable storage medium Download PDF

Info

Publication number
CN110222087B
CN110222087B CN201910401822.3A CN201910401822A CN110222087B CN 110222087 B CN110222087 B CN 110222087B CN 201910401822 A CN201910401822 A CN 201910401822A CN 110222087 B CN110222087 B CN 110222087B
Authority
CN
China
Prior art keywords
feature extraction
extraction model
data
parameter values
sample data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910401822.3A
Other languages
Chinese (zh)
Other versions
CN110222087A (en
Inventor
黄博
毕野
吴振宇
王建明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910401822.3A priority Critical patent/CN110222087B/en
Publication of CN110222087A publication Critical patent/CN110222087A/en
Priority to PCT/CN2019/118011 priority patent/WO2020228283A1/en
Application granted granted Critical
Publication of CN110222087B publication Critical patent/CN110222087B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses a feature extraction method, which comprises the following steps: acquiring training data, wherein the training data comprises original characteristics corresponding to each sample data; training an initial feature extraction model by using the training data, and obtaining parameter values of the initial feature extraction model; screening the parameter values of the initial feature extraction model to obtain screened parameter values; reconstructing the initial feature extraction model by using the screened parameter values to obtain a reconstructed feature extraction model; inputting the training data into the reconstructed feature extraction model to obtain derivative features of each sample data; and retraining the reconstructed feature extraction model according to the derivative features of each sample data and the original features corresponding to each sample data until iteration is terminated, and obtaining a trained feature extraction model. The invention can better represent the data characteristics and improve the accuracy of characteristic extraction.

Description

Feature extraction method, device and computer readable storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a feature extraction method, a feature extraction device, and a computer readable storage medium.
Background
One common data mining procedure includes data acquisition, data preprocessing, feature construction and selection, model training, prediction, and the like. Wherein feature construction and selection is time consuming but very important. This is because the result of feature construction and selection will be used as input to the machine learning model, which will not learn anything if the features cannot express hidden patterns in the data, and will naturally not provide more accurate prediction results.
In order to obtain refined input features, a great deal of manpower and time are generally required for feature construction and selection, however, the cost of manually constructing and selecting features is very high. On the one hand, a great deal of manpower is consumed; on the other hand, many features hidden in the data are difficult to find by humans.
Disclosure of Invention
The invention provides a feature extraction method, a device and a computer readable storage medium, which mainly aim to more accurately represent the features of data so as to more accurately extract the feature information of the data.
In order to achieve the above object, the present invention further provides a feature extraction method, the method comprising:
acquiring training data, wherein the training data comprises original characteristics corresponding to each sample data;
training an initial feature extraction model by using the training data, and obtaining parameter values of the initial feature extraction model;
screening the parameter values of the initial feature extraction model to obtain screened parameter values;
reconstructing the initial feature extraction model by using the screened parameter values to obtain a reconstructed feature extraction model;
inputting the training data into the reconstructed feature extraction model to obtain derivative features of each sample data;
retraining the reconstructed feature extraction model according to the derived features of each sample data and the original features corresponding to each sample data until iteration is terminated, and obtaining a trained feature extraction model;
acquiring target data;
and inputting the target data into a trained feature extraction model to obtain the features of the target data.
Preferably, the acquiring training data includes:
acquiring original sample data;
preprocessing the original sample data to obtain the training data, wherein the preprocessing comprises at least one of the following steps: normalization processing, missing value filling, noise data processing and data cleaning of inconsistent data.
Preferably, the initial feature extraction model includes a recurrent neural network model including: an input layer, a hidden layer and an output layer;
input layer: data inputs of different types in the feature data for defining the element;
hidden layer: the device is used for carrying out nonlinear processing on characteristic data of elements input by an input layer by utilizing an excitation function;
output layer: the data type is used for outputting and representing the result of hidden layer fitting and outputting the data type corresponding to the characteristics of the element;
a memory unit: the memory unit decides whether or not to write or delete the memory of the information in the neuron, and combines the characteristic data of the previously recorded element, the characteristic data of the currently memorized element and the characteristic of the currently inputted element together to record the long-term information.
Preferably, the screening the parameter values of the initial feature extraction model, and obtaining the screened parameter values includes:
calculating the sensitivity of the parameter value of the initial feature extraction model relative to the initial feature extraction model;
and sorting the parameter values of the initial feature extraction model according to the sensitivity, and selecting the parameter value with the preset number of digits before from the sorted parameter values as the parameter value after screening.
Preferably, the reconstructing the initial feature extraction model by using the filtered parameter values, and obtaining the reconstructed feature extraction model includes:
and adding the weight of the screened parameter values in the initial feature extraction model to obtain a reconstructed feature extraction model.
Preferably, retraining the reconstructed feature extraction model according to the derived feature of each sample data and the original feature corresponding to each sample data until iteration is terminated, and obtaining a trained feature extraction model includes:
combining the derived features of each sample data with the original features of each sample data to obtain combined features of each sample data;
screening important features of each sample data from the combined features of each sample data by utilizing the importance of random forest variables;
and retraining the reconstructed feature extraction model by utilizing the important features of each sample data in the training data until iteration is terminated, and obtaining a trained feature extraction model.
In order to achieve the above object, the present invention also provides a feature extraction device, the device including a memory and a processor, the memory storing a feature extraction program executable on the processor, the feature extraction program implementing the following steps when executed by the processor:
acquiring training data, wherein the training data comprises original characteristics corresponding to each sample data;
training an initial feature extraction model by using the training data, and obtaining parameter values of the initial feature extraction model;
screening the parameter values of the initial feature extraction model to obtain screened parameter values;
reconstructing the initial feature extraction model by using the screened parameter values to obtain a reconstructed feature extraction model;
inputting the training data into the reconstructed feature extraction model to obtain derivative features of each sample data;
retraining the reconstructed feature extraction model according to the derived features of each sample data and the original features corresponding to each sample data until iteration is terminated, and obtaining a trained feature extraction model;
acquiring target data;
and inputting the target data into a trained feature extraction model to obtain the features of the target data.
Preferably, the screening the parameter values of the initial feature extraction model, and obtaining the screened parameter values includes:
calculating the sensitivity of the parameter value of the initial feature extraction model relative to the initial feature extraction model;
and sorting the parameter values of the initial feature extraction model according to the sensitivity, and selecting the parameter value with the preset number of digits before from the sorted parameter values as the parameter value after screening.
Preferably, the reconstructing the initial feature extraction model by using the filtered parameter values, and obtaining the reconstructed feature extraction model includes:
and adding the weight of the screened parameter values in the initial feature extraction model to obtain a reconstructed feature extraction model.
In addition, to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a feature extraction program executable by one or more processors to implement the steps of the feature extraction method as described above.
Through the technical scheme, training data are acquired, and the training data comprise original features corresponding to each sample data; training an initial feature extraction model by using the training data, and obtaining parameter values of the initial feature extraction model; screening the parameter values of the initial feature extraction model to obtain screened parameter values; reconstructing the initial feature extraction model by using the screened parameter values to obtain a reconstructed feature extraction model; inputting the training data into the reconstructed feature extraction model to obtain derivative features of each sample data; and retraining the reconstructed feature extraction model according to the derivative features of each sample data and the original features corresponding to each sample data until iteration is terminated, and obtaining a trained feature extraction model. The invention can better represent the data characteristics and improve the accuracy of characteristic extraction.
Drawings
FIG. 1 is a flow chart of a feature extraction method according to an embodiment of the invention;
FIG. 2 is a schematic diagram illustrating an internal structure of a feature extraction device according to an embodiment of the invention;
fig. 3 is a schematic block diagram of a feature extraction procedure in a feature extraction device according to an embodiment of the invention.
The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The invention provides a feature extraction method. Referring to fig. 1, a flow chart of a feature extraction method according to an embodiment of the invention is shown. The method may be performed by an apparatus, which may be implemented in software and/or hardware.
In this embodiment, the feature extraction method includes:
s10, training data are acquired, wherein the training data comprise original features corresponding to each sample data.
In this embodiment, raw sample data is acquired;
preprocessing the original sample data to obtain the training data, wherein the preprocessing comprises at least one of the following steps: normalization processing, missing value filling, noise data processing and data cleaning of inconsistent data.
Wherein the normalization of the data is to scale the data to fall within a small specified interval. Since the feature measurement units in the feature values of the original variable are different, in order to enable the index to participate in evaluation calculation, the index needs to be normalized, and the numerical value of the original variable is mapped to a certain numerical value interval through function transformation. The normalization processing method applied by the invention is a z-score normalization method.
The processing of missing value padding comprises: deleting samples containing missing values, filling the missing values with a global constant, and so forth.
The processing of the noise data includes smoothing of the noise. Noise smoothing: noise (noise) is the random error or deviation of the measured variable. Given a numerical attribute, the following data smoothing technique may be used to smooth noise. Such as the binning method, etc.
S11, training an initial feature extraction model by using the training data, and obtaining parameter values of the initial feature extraction model.
In this embodiment, the initial feature extraction model includes a recurrent neural network model including: an input layer, a hidden layer and an output layer;
input layer: data inputs of different types in the feature data for defining the element;
hidden layer: the device is used for carrying out nonlinear processing on characteristic data of elements input by an input layer by utilizing an excitation function;
output layer: the data type is used for outputting and representing the result of hidden layer fitting and outputting the data type corresponding to the characteristics of the element;
a memory unit: the memory unit decides whether or not to write or delete the memory of the information in the neuron, and combines the characteristic data of the previously recorded element, the characteristic data of the currently memorized element and the characteristic of the currently inputted element together to record the long-term information.
When the initial feature extraction model is trained by using the neural network method, the parameter values of the initial feature extraction model can be output when training is stopped.
S12, screening the parameter values of the initial feature extraction model to obtain screened parameter values.
Since the number of features in the data will far exceed the number of training data in many cases, in order to simplify the training of the model, the present invention uses a method based on a BP neural network to perform feature selection from the feature extractor parameters, and uses the sensitivity delta of the parameter value X to the change of the feature extraction model state Y as a measure of the evaluation parameter value, so as to pick out more sensitive parameter values, so as to facilitate the subsequent excavation of more hidden features, i.e. derived features, from the sample data.
In a specific implementation, the filtering the parameter values of the initial feature extraction model, and obtaining the filtered parameter values includes:
calculating the sensitivity of the parameter value of the initial feature extraction model relative to the initial feature extraction model;
and sorting the parameter values of the initial feature extraction model according to the sensitivity, and selecting the parameter value with the preset number of digits before from the sorted parameter values as the parameter value after screening.
S13, reconstructing the initial feature extraction model by using the screened parameter values to obtain a reconstructed feature extraction model.
In an embodiment, the reconstructing the initial feature extraction model by using the filtered parameter values, to obtain a reconstructed feature extraction model includes: and in the initial feature extraction model, increasing the weight of the screened parameter values to obtain a reconstructed feature extraction model, and reducing the weight of other parameter values in the parameter values of the initial feature extraction model, so that the training of the feature extraction model is more sensitive to the features corresponding to the parameters with higher sensitivity, and more hidden features are mined.
S14, inputting the training data into the reconstructed feature extraction model to obtain derivative features of each sample data.
In this embodiment, the method for training the reconstructed feature extraction model is the same as the method for training the initial feature extraction model described above. The derivative feature is a feature which changes due to the change of the original feature, a derivative feature variable can be obtained through a depth feature extractor according to the original feature, and the derivative feature refers to a new feature obtained by feature learning by using the original data, so that the feature hidden in the original data is mined.
S15, retraining the reconstructed feature extraction model according to the derivative features of each sample data and the original features corresponding to each sample data until iteration is terminated, and obtaining a trained feature extraction model.
Retraining the reconstructed feature extraction model according to the derived features of each sample data and the original features corresponding to each sample data until iteration is terminated, and obtaining a trained feature extraction model comprises the following steps:
combining the derived features of each sample data with the original features of each sample data to obtain combined features of each sample data;
screening important features of each sample data from the combined features of each sample data by using an importance method of random forest variables;
and retraining the reconstructed feature extraction model by utilizing the important features of each sample data in the training data until iteration is terminated, and obtaining a trained feature extraction model.
Wherein the importance variable importance of the variable is an index for measuring the importance of the variable, and the method for screening the important features of each sample data from the combined features of each sample data by using the importance method of the random forest variable comprises the following steps:
1) Combining the characteristics of each sample data to form each decision tree in a random forest, and calculating out-of-band data errors of the decision trees by using corresponding out-of-band data OOB data, and marking the error as errOOB1;
2) Randomly adding noise interference to the characteristic X of all samples of the out-of-band data OOB (the value of the samples at the characteristic X can be randomly changed), and calculating the error of the out-of-band data again, namely errOOB2;
3) Assuming that an Ntree tree exists in the random forest, an objective function is constructed for the importance of the feature t, wherein the objective function shows that after a certain feature randomly adds noise, if the accuracy of the out-of-band is greatly reduced, the feature has a great influence on the classification result of the sample, namely the importance degree of the feature is higher.
S16, acquiring target data.
In this embodiment, the target data is data acquired by the electronic device, such as the image data described above by the user, or the like.
S17, inputting the target data into a trained feature extraction model to obtain the features of the target data.
The invention discloses a feature extraction method, which comprises the following steps: acquiring training data, wherein the training data comprises original characteristics corresponding to each sample data; training an initial feature extraction model by using the training data, and obtaining parameter values of the initial feature extraction model; screening the parameter values of the initial feature extraction model to obtain screened parameter values; reconstructing the initial feature extraction model by using the screened parameter values to obtain a reconstructed feature extraction model; inputting the training data into the reconstructed feature extraction model to obtain derivative features of each sample data; and retraining the reconstructed feature extraction model according to the derivative features of each sample data and the original features corresponding to each sample data until iteration is terminated, and obtaining a trained feature extraction model. The invention can better represent the data characteristics and improve the accuracy of characteristic extraction.
The invention also provides a feature extraction device. Referring to fig. 2, an internal structure of a feature extraction device according to an embodiment of the invention is shown.
In the present embodiment, the feature extraction device 1 may be a personal computer (Personal Computer, PC), or may be a terminal device such as a smart phone, a tablet computer, or a portable computer. The feature extraction device 1 comprises at least a memory 11, a processor 12, a communication bus 13, and a network interface 14.
The memory 11 includes at least one type of readable storage medium including flash memory, a hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the feature extraction device 1, such as a hard disk of the feature extraction device 1. The memory 11 may also be an external storage device of the feature extraction apparatus 1 in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the feature extraction apparatus 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the feature extraction apparatus 1. The memory 11 may be used not only for storing application software installed in the feature extraction device 1 and various types of data, such as codes of the feature extraction program 01, but also for temporarily storing data that has been output or is to be output.
Processor 12 may in some embodiments be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor or other feature extraction chip for executing program code or processing data stored in memory 11, such as executing feature extraction program 01 or the like.
The communication bus 13 is used to enable connection communication between these components.
The network interface 14 may optionally comprise a standard wired interface, a wireless interface (e.g. WI-FI interface), typically used to establish a communication connection between the apparatus 1 and other electronic devices.
Optionally, the device 1 may further comprise a user interface, which may comprise a Display (Display), an input unit such as a Keyboard (Keyboard), and a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an Organic Light-Emitting Diode (OLED) touch, or the like. The display may also be referred to as a display screen or a display unit, as appropriate, for displaying information processed in the feature extraction device 1 and for displaying a visual user interface.
Fig. 2 shows only the feature extraction device 1 with the components 11-14 and the feature extraction program 01, it being understood by those skilled in the art that the structure shown in fig. 1 does not constitute a limitation of the feature extraction device 1, and may include fewer or more components than shown, or may combine certain components, or a different arrangement of components.
In the embodiment of the apparatus 1 shown in fig. 2, the memory 11 stores a feature extraction program 01; the processor 12 performs the following steps when executing the feature extraction program 01 stored in the memory 11:
training data is acquired, wherein the training data comprises original characteristics corresponding to each sample data.
In this embodiment, raw sample data is acquired;
preprocessing the original sample data to obtain the training data, wherein the preprocessing comprises at least one of the following steps: normalization processing, missing value filling, noise data processing and data cleaning of inconsistent data.
Wherein the normalization of the data is to scale the data to fall within a small specified interval. Since the feature measurement units in the feature values of the original variable are different, in order to enable the index to participate in evaluation calculation, the index needs to be normalized, and the numerical value of the original variable is mapped to a certain numerical value interval through function transformation. The normalization processing method applied by the invention is a z-score normalization method.
The processing of missing value padding comprises: deleting samples containing missing values, filling the missing values with a global constant, and so forth.
The processing of the noise data includes smoothing of the noise. Noise smoothing: noise (noise) is the random error or deviation of the measured variable. Given a numerical attribute, the following data smoothing technique may be used to smooth noise. Such as the binning method, etc.
And training an initial feature extraction model by using the training data, and obtaining parameter values of the initial feature extraction model.
In this embodiment, the initial feature extraction model includes a recurrent neural network model including: an input layer, a hidden layer and an output layer;
input layer: data inputs of different types in the feature data for defining the element;
hidden layer: the device is used for carrying out nonlinear processing on characteristic data of elements input by an input layer by utilizing an excitation function;
output layer: the data type is used for outputting and representing the result of hidden layer fitting and outputting the data type corresponding to the characteristics of the element;
a memory unit: the memory unit decides whether or not to write or delete the memory of the information in the neuron, and combines the characteristic data of the previously recorded element, the characteristic data of the currently memorized element and the characteristic of the currently inputted element together to record the long-term information.
When the initial feature extraction model is trained by using the neural network method, the parameter values of the initial feature extraction model can be output when training is stopped.
And screening the parameter values of the initial feature extraction model to obtain screened parameter values.
Since the number of features in the data will far exceed the number of training data in many cases, in order to simplify the training of the model, the present invention uses a method based on a BP neural network to perform feature selection from the feature extractor parameters, and uses the sensitivity delta of the parameter value X to the change of the feature extraction model state Y as a measure of the evaluation parameter value, so as to pick out more sensitive parameter values, so as to facilitate the subsequent excavation of more hidden features, i.e. derived features, from the sample data.
In a specific implementation, the filtering the parameter values of the initial feature extraction model, and obtaining the filtered parameter values includes:
calculating the sensitivity of the parameter value of the initial feature extraction model relative to the initial feature extraction model;
and sorting the parameter values of the initial feature extraction model according to the sensitivity, and selecting the parameter value with the preset number of digits before from the sorted parameter values as the parameter value after screening.
And reconstructing the initial feature extraction model by using the screened parameter values to obtain a reconstructed feature extraction model.
In an embodiment, the reconstructing the initial feature extraction model by using the filtered parameter values, to obtain a reconstructed feature extraction model includes: and in the initial feature extraction model, increasing the weight of the screened parameter values to obtain a reconstructed feature extraction model, and reducing the weight of other parameter values in the parameter values of the initial feature extraction model, so that the training of the feature extraction model is more sensitive to the features corresponding to the parameters with higher sensitivity, and more hidden features are mined.
And inputting the training data into the reconstructed feature extraction model to obtain derivative features of each sample data.
In this embodiment, the method for training the reconstructed feature extraction model is the same as the method for training the initial feature extraction model described above. The derivative feature is a feature which changes due to the change of the original feature, a derivative feature variable can be obtained through a depth feature extractor according to the original feature, and the derivative feature refers to a new feature obtained by feature learning by using the original data, so that the feature hidden in the original data is mined.
And retraining the reconstructed feature extraction model according to the derivative features of each sample data and the original features corresponding to each sample data until iteration is terminated, and obtaining a trained feature extraction model.
Retraining the reconstructed feature extraction model according to the derived features of each sample data and the original features corresponding to each sample data until iteration is terminated, and obtaining a trained feature extraction model comprises the following steps:
combining the derived features of each sample data with the original features of each sample data to obtain combined features of each sample data;
screening important features of each sample data from the combined features of each sample data by using an importance method of random forest variables;
and retraining the reconstructed feature extraction model by utilizing the important features of each sample data in the training data until iteration is terminated, and obtaining a trained feature extraction model.
Wherein the importance variable importance of the variable is an index for measuring the importance of the variable, and the method for screening the important features of each sample data from the combined features of each sample data by using the importance method of the random forest variable comprises the following steps:
1) Combining the characteristics of each sample data to form each decision tree in a random forest, and calculating out-of-band data errors of the decision trees by using corresponding out-of-band data OOB data, and marking the error as errOOB1;
2) Randomly adding noise interference to the characteristic X of all samples of the out-of-band data OOB (the value of the samples at the characteristic X can be randomly changed), and calculating the error of the out-of-band data again, namely errOOB2;
3) Assuming that an Ntree tree exists in the random forest, an objective function is constructed for the importance of the feature t, wherein the objective function shows that after a certain feature randomly adds noise, if the accuracy of the out-of-band is greatly reduced, the feature has a great influence on the classification result of the sample, namely the importance degree of the feature is higher.
Target data is acquired.
In this embodiment, the target data is data acquired by the electronic device, such as the image data described above by the user, or the like.
And inputting the target data into a trained feature extraction model to obtain the features of the target data.
The invention discloses a feature extraction method, which comprises the following steps: acquiring training data, wherein the training data comprises original characteristics corresponding to each sample data; training an initial feature extraction model by using the training data, and obtaining parameter values of the initial feature extraction model; screening the parameter values of the initial feature extraction model to obtain screened parameter values; reconstructing the initial feature extraction model by using the screened parameter values to obtain a reconstructed feature extraction model; inputting the training data into the reconstructed feature extraction model to obtain derivative features of each sample data; and retraining the reconstructed feature extraction model according to the derivative features of each sample data and the original features corresponding to each sample data until iteration is terminated, and obtaining a trained feature extraction model. The invention can better represent the data characteristics and improve the accuracy of characteristic extraction.
Alternatively, in other embodiments, the feature extraction program may be divided into one or more modules, and one or more modules are stored in the memory 11 and executed by one or more processors (the processor 12 in this embodiment) to implement the present invention, where the modules refer to a series of instruction segments of a computer program capable of performing a specific function, for describing the execution of the feature extraction program in the feature extraction device.
For example, referring to fig. 3, a schematic program module of a feature extraction program in an embodiment of the feature extraction apparatus of the present invention is shown, where the feature extraction program may be divided into an acquisition module 10, a training module 20, a screening module 30, and a reconstruction module 40, and the exemplary embodiment is as follows:
the acquisition module 10 acquires training data, wherein the training data comprises original characteristics corresponding to each sample data;
training module 20 trains the initial feature extraction model using the training data and obtains parameter values of the initial feature extraction model;
the screening module 30 screens the parameter values of the initial feature extraction model to obtain screened parameter values;
the reconstruction module 40 reconstructs the initial feature extraction model by using the screened parameter values to obtain a reconstructed feature extraction model;
the training module 20 inputs the training data into the reconstructed feature extraction model to obtain derivative features of each sample data;
the training module 20 retrains the reconstructed feature extraction model according to the derived feature of each sample data and the original feature corresponding to each sample data until iteration is terminated, and obtains a trained feature extraction model;
the acquisition module 10 acquires target data;
the training module 20 inputs the target data into a trained feature extraction model to obtain features of the target data.
The functions or operation steps implemented when the program modules such as the acquisition module 10, the training module 20, the screening module 30, and the reconstruction module 40 are executed are substantially the same as those of the foregoing embodiments, and will not be described herein.
In addition, an embodiment of the present invention also proposes a computer-readable storage medium having stored thereon a feature extraction program executable by one or more processors to implement the following operations:
acquiring training data, wherein the training data comprises original characteristics corresponding to each sample data;
training an initial feature extraction model by using the training data, and obtaining parameter values of the initial feature extraction model;
screening the parameter values of the initial feature extraction model to obtain screened parameter values;
reconstructing the initial feature extraction model by using the screened parameter values to obtain a reconstructed feature extraction model;
inputting the training data into the reconstructed feature extraction model to obtain derivative features of each sample data;
retraining the reconstructed feature extraction model according to the derived features of each sample data and the original features corresponding to each sample data until iteration is terminated, and obtaining a trained feature extraction model;
acquiring target data;
and inputting the target data into a trained feature extraction model to obtain the features of the target data.
The computer-readable storage medium of the present invention is substantially the same as the above-described embodiments of the feature extraction apparatus and method, and will not be described in detail herein.
It should be noted that, the foregoing reference numerals of the embodiments of the present invention are merely for describing the embodiments, and do not represent the advantages and disadvantages of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, apparatus, article or method that comprises the element.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) as described above, comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present invention.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims (5)

1. A method of feature extraction, the method comprising:
acquiring training data, wherein the training data comprises original characteristics corresponding to each sample data;
training an initial feature extraction model by using the training data, and obtaining parameter values of the initial feature extraction model;
screening the parameter values of the initial feature extraction model to obtain screened parameter values;
reconstructing the initial feature extraction model by using the screened parameter values to obtain a reconstructed feature extraction model;
inputting the training data into the reconstructed feature extraction model to obtain derivative features of each sample data;
retraining the reconstructed feature extraction model according to the derived features of each sample data and the original features corresponding to each sample data until iteration is terminated, and obtaining a trained feature extraction model;
acquiring target data;
inputting the target data into a trained feature extraction model to obtain the features of the target data;
the step of screening the parameter values of the initial feature extraction model, and the step of obtaining the screened parameter values comprises the following steps: calculating the sensitivity of the parameter value of the initial feature extraction model relative to the initial feature extraction model; sorting the parameter values of the initial feature extraction model according to the sensitivity, and selecting the parameter value with the preset number of digits before from the sorted parameter values as the parameter value after screening;
reconstructing the initial feature extraction model by using the screened parameter values, wherein the obtaining the reconstructed feature extraction model comprises the following steps: adding the weight of the screened parameter values in the initial feature extraction model to obtain a reconstructed feature extraction model;
retraining the reconstructed feature extraction model according to the derived features of each sample data and the original features corresponding to each sample data until iteration is terminated, and obtaining a trained feature extraction model comprises the following steps: combining the derived features of each sample data with the original features of each sample data to obtain combined features of each sample data; screening important features of each sample data from the combined features of each sample data by utilizing the importance of random forest variables; and retraining the reconstructed feature extraction model by utilizing the important features of each sample data in the training data until iteration is terminated, and obtaining a trained feature extraction model.
2. The feature extraction method of claim 1, wherein the acquiring training data comprises:
acquiring original sample data;
preprocessing the original sample data to obtain the training data, wherein the preprocessing comprises at least one of the following steps: normalization processing, missing value filling, noise data processing and data cleaning of inconsistent data.
3. The feature extraction method of claim 1, wherein the initial feature extraction model comprises a recurrent neural network model comprising: an input layer, a hidden layer and an output layer;
input layer: data inputs of different types in the feature data for defining the element;
hidden layer: the device is used for carrying out nonlinear processing on characteristic data of elements input by an input layer by utilizing an excitation function;
output layer: the data type is used for outputting and representing the result of hidden layer fitting and outputting the data type corresponding to the characteristics of the element;
a memory unit: the memory unit decides whether or not to write or delete the memory of the information in the neuron, and combines the characteristic data of the previously recorded element, the characteristic data of the currently memorized element and the characteristic of the currently inputted element together to record the long-term information.
4. A feature extraction apparatus for implementing the feature extraction method of any one of claims 1 to 3, the apparatus comprising a memory and a processor, the memory having stored thereon a feature extraction program executable on the processor, the feature extraction program implementing the following steps when executed by the processor:
acquiring training data, wherein the training data comprises original characteristics corresponding to each sample data;
training an initial feature extraction model by using the training data, and obtaining parameter values of the initial feature extraction model;
screening the parameter values of the initial feature extraction model to obtain screened parameter values;
reconstructing the initial feature extraction model by using the screened parameter values to obtain a reconstructed feature extraction model;
inputting the training data into the reconstructed feature extraction model to obtain derivative features of each sample data;
retraining the reconstructed feature extraction model according to the derived features of each sample data and the original features corresponding to each sample data until iteration is terminated, and obtaining a trained feature extraction model;
acquiring target data;
and inputting the target data into a trained feature extraction model to obtain the features of the target data.
5. A computer-readable storage medium, having stored thereon a feature extraction program executable by one or more processors to implement the feature extraction method of any one of claims 1 to 3.
CN201910401822.3A 2019-05-15 2019-05-15 Feature extraction method, device and computer readable storage medium Active CN110222087B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910401822.3A CN110222087B (en) 2019-05-15 2019-05-15 Feature extraction method, device and computer readable storage medium
PCT/CN2019/118011 WO2020228283A1 (en) 2019-05-15 2019-11-13 Feature extraction method and apparatus, and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910401822.3A CN110222087B (en) 2019-05-15 2019-05-15 Feature extraction method, device and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN110222087A CN110222087A (en) 2019-09-10
CN110222087B true CN110222087B (en) 2023-10-17

Family

ID=67821146

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910401822.3A Active CN110222087B (en) 2019-05-15 2019-05-15 Feature extraction method, device and computer readable storage medium

Country Status (2)

Country Link
CN (1) CN110222087B (en)
WO (1) WO2020228283A1 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222087B (en) * 2019-05-15 2023-10-17 平安科技(深圳)有限公司 Feature extraction method, device and computer readable storage medium
CN112328909B (en) * 2020-11-17 2021-07-02 中国平安人寿保险股份有限公司 Information recommendation method and device, computer equipment and medium
CN112434323A (en) * 2020-12-01 2021-03-02 Oppo广东移动通信有限公司 Model parameter obtaining method and device, computer equipment and storage medium
CN113255933A (en) * 2021-06-01 2021-08-13 上海商汤科技开发有限公司 Feature engineering and graph network generation method and device and distributed system
CN113434295B (en) * 2021-06-30 2023-06-20 平安科技(深圳)有限公司 Farmland monitoring method, device, equipment and storage medium based on edge calculation
CN114070602A (en) * 2021-11-11 2022-02-18 北京天融信网络安全技术有限公司 HTTP tunnel detection method, device, electronic equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108198625A (en) * 2016-12-08 2018-06-22 北京推想科技有限公司 A kind of deep learning method and apparatus for analyzing higher-dimension medical data
CN108205766A (en) * 2016-12-19 2018-06-26 阿里巴巴集团控股有限公司 Information-pushing method, apparatus and system
CN108960436A (en) * 2018-07-09 2018-12-07 上海应用技术大学 Feature selection approach
CN108984683A (en) * 2018-06-29 2018-12-11 北京百度网讯科技有限公司 Extracting method, system, equipment and the storage medium of structural data
CN109034201A (en) * 2018-06-26 2018-12-18 阿里巴巴集团控股有限公司 Model training and rule digging method and system
CN109213805A (en) * 2018-09-07 2019-01-15 东软集团股份有限公司 A kind of method and device of implementation model optimization
CN109635953A (en) * 2018-11-06 2019-04-16 阿里巴巴集团控股有限公司 A kind of feature deriving method, device and electronic equipment
CN109636421A (en) * 2018-12-13 2019-04-16 平安医疗健康管理股份有限公司 Medical data exception recognition methods, equipment and storage medium based on machine learning

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10318882B2 (en) * 2014-09-11 2019-06-11 Amazon Technologies, Inc. Optimized training of linear machine learning models
CN104679960B (en) * 2015-03-13 2018-04-03 上海集成电路研发中心有限公司 A kind of statistical modeling method of radio frequency variodenser
CN109447960A (en) * 2018-10-18 2019-03-08 神州数码医疗科技股份有限公司 A kind of object identifying method and device
CN109685051A (en) * 2018-11-14 2019-04-26 国网上海市电力公司 A kind of infrared image fault diagnosis system based on network system
CN110222087B (en) * 2019-05-15 2023-10-17 平安科技(深圳)有限公司 Feature extraction method, device and computer readable storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108198625A (en) * 2016-12-08 2018-06-22 北京推想科技有限公司 A kind of deep learning method and apparatus for analyzing higher-dimension medical data
CN108205766A (en) * 2016-12-19 2018-06-26 阿里巴巴集团控股有限公司 Information-pushing method, apparatus and system
CN109034201A (en) * 2018-06-26 2018-12-18 阿里巴巴集团控股有限公司 Model training and rule digging method and system
CN108984683A (en) * 2018-06-29 2018-12-11 北京百度网讯科技有限公司 Extracting method, system, equipment and the storage medium of structural data
CN108960436A (en) * 2018-07-09 2018-12-07 上海应用技术大学 Feature selection approach
CN109213805A (en) * 2018-09-07 2019-01-15 东软集团股份有限公司 A kind of method and device of implementation model optimization
CN109635953A (en) * 2018-11-06 2019-04-16 阿里巴巴集团控股有限公司 A kind of feature deriving method, device and electronic equipment
CN109636421A (en) * 2018-12-13 2019-04-16 平安医疗健康管理股份有限公司 Medical data exception recognition methods, equipment and storage medium based on machine learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Optimal Deep Learning LSTM Model for Electric Load Forecasting using Feature Selection and Genetic Algorithm: Comparison with Machine;Salah Bouktif et al;《energies》;第1-18页 *

Also Published As

Publication number Publication date
WO2020228283A1 (en) 2020-11-19
CN110222087A (en) 2019-09-10

Similar Documents

Publication Publication Date Title
CN110222087B (en) Feature extraction method, device and computer readable storage medium
CN108416198B (en) Device and method for establishing human-machine recognition model and computer readable storage medium
CN110889134B (en) Data desensitizing method and device and electronic equipment
CN107633254A (en) Establish device, method and the computer-readable recording medium of forecast model
CN106777177A (en) Search method and device
CN109284372B (en) User operation behavior analysis method, electronic device and computer readable storage medium
CN113159147B (en) Image recognition method and device based on neural network and electronic equipment
CN108520196A (en) Luxury goods discriminating conduct, electronic device and storage medium
CN107818491A (en) Electronic installation, Products Show method and storage medium based on user's Internet data
CN112581227A (en) Product recommendation method and device, electronic equipment and storage medium
US20090204703A1 (en) Automated document classifier tuning
CN106708729A (en) Code defect predicting method and device
JP2018195231A (en) Learning model creation device, learning model creation method, and learning model creation program
CN112508456A (en) Food safety risk assessment method, system, computer equipment and storage medium
CN113592605A (en) Product recommendation method, device, equipment and storage medium based on similar products
CN113807728A (en) Performance assessment method, device, equipment and storage medium based on neural network
CN114782060A (en) Interactive product detection method and system
CN104933096B (en) Abnormal key recognition methods, device and the data system of database
US9311518B2 (en) Systems and methods for efficient comparative non-spatial image data analysis
CN115186776B (en) Method, device and storage medium for classifying ruby producing areas
CN109491970A (en) Imperfect picture detection method, device and storage medium towards cloud storage
CN112069807A (en) Text data theme extraction method and device, computer equipment and storage medium
CN110162459A (en) Test cases generation method, device and computer readable storage medium
CN112131418A (en) Target labeling method, target labeling device and computer-readable storage medium
CN111989662A (en) Autonomous hybrid analysis modeling platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant