CN116383659A - Parameter optimization method and device for machine learning feature engineering - Google Patents

Parameter optimization method and device for machine learning feature engineering Download PDF

Info

Publication number
CN116383659A
CN116383659A CN202310384601.6A CN202310384601A CN116383659A CN 116383659 A CN116383659 A CN 116383659A CN 202310384601 A CN202310384601 A CN 202310384601A CN 116383659 A CN116383659 A CN 116383659A
Authority
CN
China
Prior art keywords
training
model
training samples
dimension
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310384601.6A
Other languages
Chinese (zh)
Inventor
郝伟
刘加瑞
陈勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Huayun'an Technology Co ltd
Original Assignee
Anhui Huayun'an Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Huayun'an Technology Co ltd filed Critical Anhui Huayun'an Technology Co ltd
Priority to CN202310384601.6A priority Critical patent/CN116383659A/en
Publication of CN116383659A publication Critical patent/CN116383659A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Feedback Control In General (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure provides a parameter optimization method and apparatus for machine learning feature engineering, the method comprising: acquiring a current sample space, and quantifying the importance of the dimension characteristics of a first training sample in the current sample space; descending order sorting is carried out on the dimensions in the first training sample according to the quantization result; in the process of training a neural network model, for the ith training, selecting dimension features of i before sequencing from the first training samples to form second training samples corresponding to the first training samples, and training the neural network model by using the second training samples to generate a target model, wherein i is a natural number, i is less than or equal to n, and n is the feature dimension of the first training samples; and verifying the target model, and selecting a model meeting preset conditions as a final model. In this way, the features can be automatically evaluated, thereby improving the working efficiency and the accuracy of the generated model.

Description

Parameter optimization method and device for machine learning feature engineering
Technical Field
Embodiments of the present disclosure relate generally to the field of machine learning technology, and more particularly, to a parameter optimization method and apparatus for machine learning feature engineering.
Background
The feature engineering is to screen out data features from the original data to improve the training effect of the model. Generally, the first step in the machine learning process is to define a feature set of a sample, and then select an appropriate sample set for training based on the defined feature. This process often goes through a relatively time-consuming tuning process, i.e., researchers need to select and recombine the different possible combinations of possible features of the data to arrive at a better training model that meets the needs. From a mathematical level, it is easy to analyze that n possible combinations of characteristics would have n-! A kind of module is assembled in the module and the module is assembled in the module. Meanwhile, during this analysis, since manual adjustment and combination are required, for n ≡! The possible combinations often require human experience to screen to reduce test space. However, finding a suitable model is often difficult: not only requires a high experience, but also consumes considerable time.
Disclosure of Invention
According to the embodiment of the disclosure, a parameter optimization scheme for machine learning feature engineering is provided, and is used for automatically evaluating features, so that the working efficiency and the accuracy of a generated model are improved.
In a first aspect of the present disclosure, there is provided a parameter optimization method for machine learning feature engineering, comprising:
acquiring a current sample space, and quantifying the importance of the dimension characteristics of a first training sample in the current sample space;
descending order sorting is carried out on the dimensions in the first training sample according to the quantization result;
in the process of training a neural network model, for the ith training, selecting dimension features of i before sequencing from the first training samples to form second training samples corresponding to the first training samples, and training the neural network model by using the second training samples to generate a target model, wherein i is a natural number, i is less than or equal to n, and n is the feature dimension of the first training samples;
and verifying the target model, and selecting a model meeting preset conditions as a final model.
In some embodiments, the importance quantifying the dimensional features of the first training sample in the current sample space includes:
importance quantifying the dimension feature of the first training sample by a sample bias value, wherein the bias index w of the ith dimension feature i Calculated by:
Figure BDA0004173418040000021
wherein w is i For the bias index, m is the number of first training samples in the current sample space.
In some embodiments, the ordering the dimensions in the first training sample in descending order according to the quantization result includes:
and ordering the dimensions in the first training samples in descending order according to the order of the sample deviation values from high to low.
In some embodiments, after the descending order of the dimensions in the first training sample according to the quantization result, further comprising: dividing a first training sample in the current sample space into a training set and a verification set;
and in the process of training the neural network model, selecting dimension characteristics of i before sequencing from the training set to form a second training sample corresponding to the first training sample, and training the neural network model.
In some embodiments, after the generating the target model, the method further comprises:
and verifying the target model by using the verification set.
In some embodiments, said validating said target model with said validation set comprises:
and selecting the accuracy, the precision, the recall or F1 as an evaluation index according to the requirements of practical application, and verifying the target model by using the verification set.
In some embodiments, the verifying the target model, selecting a model that meets a preset condition as a final model includes:
and selecting a target model with model identification accuracy greater than a preset threshold as a final target model according to the result of verifying the target model by using the verification set for the generated multiple target models.
In a second aspect of the present disclosure, there is provided a parameter optimization apparatus for machine learning feature engineering, comprising:
the sample space acquisition module is used for acquiring a current sample space and carrying out importance quantification on the dimension characteristics of a first training sample in the current sample space;
the dimension sorting module is used for sorting the dimensions in the first training sample in a descending order according to the quantization result;
the model training module is used for selecting dimension characteristics of i before sequencing from the first training samples to form second training samples corresponding to the first training samples for the ith training in the process of training the neural network model, and training the neural network model by using the second training samples to generate a target model, wherein i is a natural number, i is less than or equal to n, and n is the characteristic dimension of the first training samples;
and the model verification module is used for verifying the target model and selecting a model meeting preset conditions as a final model.
In a third aspect of the present disclosure, there is provided an electronic device comprising a memory having a computer program stored thereon and a processor that when executing the program implements the method as described above.
In a fourth aspect of the present disclosure, a computer readable storage medium is provided, on which a computer program is stored, which program, when being executed by a processor, implements a method as described above.
Through the parameter optimization method for the machine learning feature engineering, the features can be automatically evaluated, so that the working efficiency and the accuracy of the generated model are improved.
The matters described in the summary section are not intended to limit key or critical features of the embodiments of the present disclosure nor to limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The above and other features, advantages and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. In the drawings, wherein like or similar reference numerals denote like or similar elements, in which:
FIG. 1 illustrates a flow chart of a parameter optimization method for machine learning feature engineering in accordance with an embodiment of the present disclosure;
FIG. 2 shows a schematic structural diagram of a parameter optimization device for machine learning feature engineering in accordance with a second embodiment of the present disclosure;
fig. 3 shows a schematic block diagram of an electronic device used to implement an embodiment of the present disclosure.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are some embodiments of the present disclosure, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the disclosure, are within the scope of the disclosure.
In addition, the term "and/or" herein is merely an association relationship describing an association object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.
The parameter optimization method for the machine learning feature engineering can realize the automation of a feature evaluation process, and can effectively select proper feature parameters through the process to serve as a feature engineering model of a system, so that the manual evaluation process is realized by fully utilizing a computer, the working efficiency is greatly improved, and meanwhile, the accuracy of the model is also effectively improved.
Specifically, as shown in fig. 1, a flowchart of a parameter optimization method for machine learning feature engineering according to an embodiment of the present disclosure is shown. As an optional embodiment of the disclosure, in this embodiment, the parameter optimization method for machine learning feature engineering may include the following steps:
s101: and acquiring a current sample space, and carrying out importance quantification on the dimension characteristics of a first training sample in the current sample space.
The parameter optimization method for machine learning feature engineering of the embodiment of the disclosure can be applied to feature engineering, and in particular can be applied to a feature evaluation process of feature engineering. The feature engineering is to screen out data features from the original data to improve the training effect of the model. Generally, the first step in the machine learning process is to define a feature set of a sample, and then select an appropriate sample set for training based on the defined feature. This process often goes through a relatively time-consuming tuning process, i.e., researchers need to select and recombine the different possible combinations of possible features of the data to arrive at a better training model that meets the needs. From a mathematical level, it is easy to analyze that n possible combinations of characteristics would have n-! A kind of module is assembled in the module and the module is assembled in the module. Meanwhile, during this analysis, since manual adjustment and combination are required, for n ≡! The possible combinations often require human experience to screen to reduce test space. However, finding a suitable model is often difficult, not only requiring a high degree of experience, but also consuming a considerable amount of time.
To this end, the present disclosure provides a parameter optimization method for machine learning feature engineering for improving work efficiency and model accuracy.
In the process of training a model by utilizing feature engineering, a sample space is often needed, and without losing generality, the technical scheme of the disclosure is described by taking a sample space as an example (namely a current sample space).
First, a current sample space is acquired, which includes a plurality of training samples, where each sample may be, for example, a vector including a plurality of latitudes, and for samples that are not represented in a vector form, samples that are quantized into a vector form may be first.
After the current sample space is acquired, the importance of the training samples (denoted as the first training samples) in the current sample space is quantified by using the method in this embodiment. Specifically, the dimension features of the first training sample may be importance quantified by a sample bias value, wherein the bias index w of the i-th dimension feature i Calculated by:
Figure BDA0004173418040000061
wherein w is i For the bias index, m is the number of first training samples in the current sample space.
S102: and ordering the dimensions in the first training sample in a descending order according to the quantization result.
In this embodiment, after importance quantization is performed on the dimension features of the first training sample, the dimensions in the first training sample may be further sorted in descending order according to the quantization result.
Specifically, the dimensions in the first training sample may be sorted in descending order according to the order of the sample offset values from high to low. For example, the sample space a= { x for an n-dimensional vector x 1 ,x 2 ,...,x m I.e., where a=m, i.e., there are m samples in space.Each sample x has n features, which can be expressed as: x is x i =(x (1) ,x (2) ,...,x n ). The importance of the current dimension is quantified, e.g., the deviation index wi of the i-th dimension feature can be expressed using the following formula:
Figure BDA0004173418040000071
by calculating the data of each dimension by using the formula, the deviation index of each latitude can be effectively calculated, and thus the deviation index can be used as an evaluation basis of the feature importance.
After the bias index ordering, the important dimensions can be arranged in the front position.
S103: in the training process of the neural network model, for the ith training, selecting dimension features of i before sequencing from the first training samples to form second training samples corresponding to the first training samples, and training the neural network model by using the second training samples to generate a target model, wherein i is a natural number, i is less than or equal to n, and n is the feature dimension of the first training samples.
For a sample with n features, n training runs were performed, each based on: and training the ith training, namely training the first i features which are sequenced according to the deviation values by using a training set to obtain n trained models.
S104: and verifying the target model, and selecting a model meeting preset conditions as a final model.
After the target model is generated, the target model can be verified, and a model meeting preset conditions is selected as a final model. For example, the accuracy, precision, recall or F1 may be selected as an evaluation index according to the requirements of the practical application, and the verification set is used to verify the target model.
The parameter optimization method for the machine learning feature engineering can realize the automation of a feature evaluation process, and can effectively select proper feature parameters through the process to serve as a feature engineering model of a system, so that the manual evaluation process is realized by fully utilizing a computer, the working efficiency is greatly improved, and meanwhile, the accuracy of the model is also effectively improved.
Furthermore, as an optional embodiment of the disclosure, after the descending order of the dimensions in the first training sample according to the quantization result, the method further includes: dividing a first training sample in the current sample space into a training set and a verification set;
and in the process of training the neural network model, selecting dimension characteristics of i before sequencing from the training set to form a second training sample corresponding to the first training sample, and training the neural network model.
After the generating the target model, the method further comprises: and verifying the target model by using the verification set.
The verifying the target model, selecting a model meeting preset conditions as a final model, comprises the following steps:
and selecting a target model with model identification accuracy greater than a preset threshold as a final target model according to the result of verifying the target model by using the verification set for the generated multiple target models.
According to the method, the training set and the verification set are set, so that the accuracy of the model is further improved.
It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of action combinations, but it should be understood by those skilled in the art that the present disclosure is not limited by the order of actions described, as some steps may take other order or occur simultaneously in light of the present disclosure. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all alternative embodiments, and that the acts and modules referred to are not necessarily required by the present disclosure.
The foregoing is a description of embodiments of the method, and the following further describes embodiments of the present disclosure through examples of apparatus.
As shown in fig. 2, a parameter optimization apparatus for machine learning feature engineering according to a second embodiment of the present disclosure includes:
a sample space obtaining module 201, configured to obtain a current sample space, and perform importance quantization on dimension features of a first training sample in the current sample space;
a dimension sorting module 202, configured to sort dimensions in the first training sample in descending order according to a quantization result;
the model training module 203 is configured to, in a training process of the neural network model, select, for an ith training, dimension features of i before sorting from the first training samples to form a second training sample corresponding to the first training sample, and train the neural network model by using the second training sample to generate a target model, where i is a natural number, i is less than or equal to n, and n is a feature dimension of the first training sample;
the model verification module 204 is configured to verify the target model, and select a model that meets a preset condition as a final model.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the described modules may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again.
Fig. 3 shows a schematic block diagram of an electronic device 300 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
The electronic device 300 includes a computing unit 301 that can perform various appropriate actions and processes according to a computer program stored in a ROM302 or a computer program loaded from a storage unit 308 into a RAM 303. In the RAM303, various programs and data required for the operation of the electronic device 300 may also be stored. The computing unit 301, the ROM302, and the RAM303 are connected to each other by a bus 304. I/O interface 305 is also connected to bus 304.
Various components in the electronic device 300 are connected to the I/O interface 305, including: an input unit 306 such as a keyboard, a mouse, etc.; an output unit 307 such as various types of displays, speakers, and the like; a storage unit 308 such as a magnetic disk, an optical disk, or the like; and a communication unit 309 such as a network card, modem, wireless communication transceiver, etc. The communication unit 309 allows the electronic device 300 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.
The computing unit 301 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 301 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 301 performs the various methods and processes described above, such as a parameter optimization method for machine learning feature engineering. For example, in some embodiments, the parameter optimization method for machine learning feature engineering may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 308. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 300 via the ROM302 and/or the communication unit 309. When the computer program is loaded into RAM303 and executed by computing unit 301, one or more of the steps of the parameter optimization method for machine learning feature engineering described above may be performed. Alternatively, in other embodiments, the computing unit 301 may be configured to perform the parameter optimization method for machine learning feature engineering in any other suitable way (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems-on-chips (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: display means for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (10)

1. A method for optimizing parameters for machine learning feature engineering, comprising:
acquiring a current sample space, and quantifying the importance of the dimension characteristics of a first training sample in the current sample space;
descending order sorting is carried out on the dimensions in the first training sample according to the quantization result;
in the process of training a neural network model, for the ith training, selecting dimension features of i before sequencing from the first training samples to form second training samples corresponding to the first training samples, and training the neural network model by using the second training samples to generate a target model, wherein i is a natural number, i is less than or equal to n, and n is the feature dimension of the first training samples;
and verifying the target model, and selecting a model meeting preset conditions as a final model.
2. The method of claim 1, wherein said importance quantifying the dimensional features of the first training sample in the current sample space comprises:
importance quantifying the dimension feature of the first training sample by a sample bias value, wherein the bias index w of the ith dimension feature i Calculated by:
Figure FDA0004173418030000011
wherein w is i For the bias index, m is the number of first training samples in the current sample space.
3. The method of claim 1, wherein the ordering the dimensions in the first training samples in descending order according to the quantization result comprises:
and ordering the dimensions in the first training samples in descending order according to the order of the sample deviation values from high to low.
4. A method of optimizing parameters according to claim 3, further comprising, after said ordering of dimensions in said first training samples in descending order according to quantization results: dividing a first training sample in the current sample space into a training set and a verification set;
and in the process of training the neural network model, selecting dimension characteristics of i before sequencing from the training set to form a second training sample corresponding to the first training sample, and training the neural network model.
5. The method of parameter optimization of claim 4, wherein after the generating the target model, the method further comprises:
and verifying the target model by using the verification set.
6. The method of claim 5, wherein validating the object model using the validation set comprises:
and selecting the accuracy, the precision, the recall or F1 as an evaluation index according to the requirements of practical application, and verifying the target model by using the verification set.
7. The method for optimizing parameters according to claim 6, wherein verifying the target model, selecting a model satisfying a preset condition as a final model, comprises:
and selecting a target model with model identification accuracy greater than a preset threshold as a final target model according to the result of verifying the target model by using the verification set for the generated multiple target models.
8. Parameter optimization apparatus for machine learning feature engineering, characterized by comprising:
the sample space acquisition module is used for acquiring a current sample space and carrying out importance quantification on the dimension characteristics of a first training sample in the current sample space;
the dimension sorting module is used for sorting the dimensions in the first training sample in a descending order according to the quantization result;
the model training module is used for selecting dimension characteristics of i before sequencing from the first training samples to form second training samples corresponding to the first training samples for the ith training in the process of training the neural network model, and training the neural network model by using the second training samples to generate a target model, wherein i is a natural number, i is less than or equal to n, and n is the characteristic dimension of the first training samples;
and the model verification module is used for verifying the target model and selecting a model meeting preset conditions as a final model.
9. An electronic device comprising a memory and a processor, the memory having stored thereon a computer program, characterized in that the processor, when executing the program, implements the method of any of claims 1-7.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any one of claims 1-7.
CN202310384601.6A 2023-04-06 2023-04-06 Parameter optimization method and device for machine learning feature engineering Pending CN116383659A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310384601.6A CN116383659A (en) 2023-04-06 2023-04-06 Parameter optimization method and device for machine learning feature engineering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310384601.6A CN116383659A (en) 2023-04-06 2023-04-06 Parameter optimization method and device for machine learning feature engineering

Publications (1)

Publication Number Publication Date
CN116383659A true CN116383659A (en) 2023-07-04

Family

ID=86969131

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310384601.6A Pending CN116383659A (en) 2023-04-06 2023-04-06 Parameter optimization method and device for machine learning feature engineering

Country Status (1)

Country Link
CN (1) CN116383659A (en)

Similar Documents

Publication Publication Date Title
WO2020143321A1 (en) Training sample data augmentation method based on variational autoencoder, storage medium and computer device
CN109947940B (en) Text classification method, device, terminal and storage medium
CN111368887B (en) Training method of thunderstorm weather prediction model and thunderstorm weather prediction method
CN116596095B (en) Training method and device of carbon emission prediction model based on machine learning
CN111382906A (en) Power load prediction method, system, equipment and computer readable storage medium
CN113360711A (en) Model training and executing method, device, equipment and medium for video understanding task
CN114580649A (en) Method and device for eliminating quantum Pagli noise, electronic equipment and medium
CN114861039B (en) Parameter configuration method, device, equipment and storage medium of search engine
CN115392441A (en) Method, apparatus, device and medium for on-chip adaptation of quantized neural network model
CN116580223A (en) Data processing and model fine tuning method and device, electronic equipment and storage medium
CN113642710B (en) Quantification method, device, equipment and storage medium of network model
CN113094899B (en) Random power flow calculation method and device, electronic equipment and storage medium
CN117971487A (en) High-performance operator generation method, device, equipment and storage medium
CN113052063A (en) Confidence threshold selection method, device, equipment and storage medium
US11507782B2 (en) Method, device, and program product for determining model compression rate
CN115345312A (en) Electronic design automation method and device
CN112329822A (en) Method, system, equipment and medium for improving classification precision of support vector machine
CN115456184B (en) Quantum circuit processing method, quantum state preparation device, quantum state preparation equipment and quantum state preparation medium
CN116383659A (en) Parameter optimization method and device for machine learning feature engineering
CN114444606A (en) Model training and data classification method and device
CN114462595A (en) Block chain-based model lightweight method, device, equipment and storage medium
CN114491416B (en) Processing method and device of characteristic information, electronic equipment and storage medium
CN116782374A (en) WiFi fingerprint library updating method and device and electronic equipment
CN113435058B (en) Data dimension reduction method, system, terminal and medium for distribution network self-healing test model
CN114626546A (en) Atmospheric pollution source data analysis method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination