CN111563067B - Feature processing method and device - Google Patents

Feature processing method and device Download PDF

Info

Publication number
CN111563067B
CN111563067B CN202010372184.XA CN202010372184A CN111563067B CN 111563067 B CN111563067 B CN 111563067B CN 202010372184 A CN202010372184 A CN 202010372184A CN 111563067 B CN111563067 B CN 111563067B
Authority
CN
China
Prior art keywords
iteration
log file
feature
model
current
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010372184.XA
Other languages
Chinese (zh)
Other versions
CN111563067A (en
Inventor
吴作鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank of China Ltd
Original Assignee
Bank of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank of China Ltd filed Critical Bank of China Ltd
Priority to CN202010372184.XA priority Critical patent/CN111563067B/en
Publication of CN111563067A publication Critical patent/CN111563067A/en
Application granted granted Critical
Publication of CN111563067B publication Critical patent/CN111563067B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a feature processing method and a feature processing device, wherein an iteration model identifier uniquely corresponding to a feature combination is generated based on the feature combination of all features to be evaluated of the current feature iteration, the iteration model identifier is used as a log file name of the current feature iteration, when a target log file with the same log file name as the iteration model identifier is found, the target log file is analyzed, and a current model evaluation score of the current feature iteration is obtained from the analyzed target log file. According to the invention, aiming at the situation that the iterative model identification and the model evaluation score during single model training are both recorded in the log file, when the characteristic iterative process is terminated due to uncertain factors, the corresponding model evaluation score can be obtained from the log file taking the iterative model identification as the name of the log file by calculating the iterative model identification when the characteristic iteration is terminated, so that the waste of model repeated training time is reduced, and the characteristic processing efficiency is improved.

Description

Feature processing method and device
Technical Field
The invention relates to the technical field of computers, in particular to a feature processing method and device.
Background
In applying machine learning techniques to solve production problems, extensive model training is typically required to obtain the best performing model. In the model training process, the effect of a large number of features is required to be evaluated, particularly a large number of features constructed by a feature derivation mode, wherein some features can generate a positive effect on the model, and some features can generate a disturbing effect on the model. Currently, when these features are evaluated, a gradually increasing or gradually decreasing mode is usually adopted to iteratively train a model, and according to the final evaluation score of each feature, a feature with a good effect is obtained by screening.
At present, in the process of screening features, because an iterative training mode is adopted to evaluate the quality of the features, once a model is accidentally terminated in the training process due to various uncertain factors, an iterative training process may need to be restarted. If the iterative training is restarted, the time spent on the model training before the program crashes is wasted, so that the whole model training process takes a long time.
Disclosure of Invention
In view of this, the present invention discloses a feature processing method and apparatus, so as to achieve that when a feature iteration process is terminated due to an uncertain factor, by calculating an iteration model identifier when the feature iteration is terminated, a corresponding model evaluation score can be obtained from a log file with the iteration model identifier as a log file name, thereby reducing waste of model repeated training time and improving feature processing efficiency.
A method of feature processing, comprising:
generating an iteration model identifier uniquely corresponding to the feature combination based on the feature combination of all features to be evaluated of the current feature iteration, and taking the iteration model identifier as the name of a log file of the current feature iteration;
judging whether a log file with the same log file name as the iterative model identification exists or not, and recording the log file as a target log file, wherein the log file records the iterative model identification obtained by calculation in single model training and a model evaluation score obtained by training;
and if so, analyzing the target log file, and acquiring the current model evaluation score of the current characteristic iteration from the analyzed target log file.
Optionally, when the iterative model identifier is an MD5 value, the generating, based on the feature combination of all the features to be evaluated of the current feature iteration, an iterative model identifier uniquely corresponding to the feature combination, and using the iterative model identifier as the name of the log file of the current feature iteration specifically includes:
and generating an MD5 value by adopting an MD5 information abstract algorithm for the feature combination, and taking the MD5 value as the name of the log file of the current feature iteration.
Optionally, the method further includes:
if not, performing model training on all the features to be evaluated to obtain the current model evaluation score of the current feature iteration, and storing the iteration model identification and the current model evaluation score into a log file taking the iteration model identification as the name of the log file in a corresponding relation mode.
Optionally, after obtaining the current model evaluation score, the method further includes:
judging whether the current characteristic iteration is the last characteristic iteration of all the characteristic iterations;
if so, finding the best feature combination which is obtained by screening the feature combination with the highest model evaluation score from all the model evaluation scores generated in the iterative process.
A feature processing apparatus comprising:
the identification generation unit is used for generating an iteration model identification which is uniquely corresponding to the characteristic combination based on the characteristic combination of all the characteristics to be evaluated of the current characteristic iteration, and taking the iteration model identification as the name of the log file of the current characteristic iteration;
the first judging unit is used for judging whether a log file with the same log file name as the iterative model identification exists and recording the log file as a target log file, wherein the log file records the iterative model identification obtained by calculation during single model training and a model evaluation score obtained by training;
and the analysis unit is used for analyzing the target log file and acquiring the current model evaluation score of the current characteristic iteration from the analyzed target log file under the condition that the first judgment unit judges that the current model evaluation score is positive.
Optionally, the identifier generating unit is specifically configured to:
and generating an MD5 value by adopting an MD5 information abstract algorithm for the feature combination, and taking the MD5 value as the name of the log file of the current feature iteration.
Optionally, the method further includes:
and the training unit is used for carrying out model training on all the features to be evaluated under the condition that the first judgment unit judges that the features are not evaluated, obtaining the current model evaluation score of the feature iteration, and storing the iteration model identification and the current model evaluation score into a log file taking the iteration model identification as the name of the log file in a corresponding relation mode.
Optionally, the method further includes:
a second judging unit, configured to judge whether the current feature iteration is the last feature iteration of all feature iterations after the analyzing unit or the training unit obtains the current model evaluation score;
and the searching unit is used for searching the best feature combination which is obtained by screening the feature combination with the highest model evaluation score from all the model evaluation scores generated in all the iterative processes under the condition that the second judging unit judges that the feature combination is the best feature combination.
According to the technical scheme, the invention discloses a feature processing method and a feature processing device, an iteration model identification uniquely corresponding to the feature combination is generated based on the feature combination of all features to be evaluated of the current feature iteration, the iteration model identification is used as a log file name of the current feature iteration, when a log file with the same log file name as the iteration model identification, namely a target log file, is found, the target log file is analyzed, and the current model evaluation score of the current feature iteration is obtained from the analyzed target log file. Because the iteration model identification obtained by calculation during single model training and the model evaluation score obtained by training are recorded in the log file taking the iteration model identification as the log file name, when the characteristic iteration process is terminated due to uncertain factors, the corresponding model evaluation score can be obtained from the log file taking the iteration model identification as the log file name by calculating the iteration model identification when the characteristic iteration is terminated, thereby reducing the waste of repeated training time of the model and improving the characteristic processing efficiency.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the disclosed drawings without creative efforts.
FIG. 1 is a flow chart of a feature processing method disclosed in an embodiment of the present invention;
FIG. 2 is a flow chart of another feature processing method disclosed in the embodiments of the present invention;
FIG. 3 is a flow chart of another feature processing method disclosed in the embodiments of the present invention;
FIG. 4 is a schematic structural diagram of a feature processing apparatus according to an embodiment of the present disclosure;
FIG. 5 is a schematic structural diagram of another feature processing apparatus according to an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of another feature processing apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention discloses a feature processing method and a feature processing device, wherein an iteration model identification uniquely corresponding to the feature combination is generated based on the feature combination of all features to be evaluated of the current feature iteration, the iteration model identification is used as a log file name of the current feature iteration, when a log file with the same log file name as the iteration model identification, namely a target log file, is found, the target log file is analyzed, and a current model evaluation score of the current feature iteration is obtained from the analyzed target log file. Because the iteration model identification obtained by calculation during single model training and the model evaluation score obtained by training are both recorded in the log file taking the iteration model identification as the name of the log file, when the characteristic iteration process is terminated due to uncertain factors, the corresponding model evaluation score can be obtained from the log file taking the iteration model identification as the name of the log file by calculating the iteration model identification when the characteristic iteration is terminated, thereby reducing the waste of repeated model training time and improving the characteristic processing efficiency.
Referring to fig. 1, a flowchart of a feature processing method according to an embodiment of the present invention includes:
step S101, generating an iteration model identification uniquely corresponding to the feature combination based on the feature combination of all features to be evaluated of the current feature iteration, and taking the iteration model identification as the name of a log file of the current feature iteration;
the iteration model identifications are used for distinguishing different iteration steps, and the iteration model identifications generated by the same iteration step are the same.
Optionally, the iterative model identifier may be MD5 obtained by using an MD5 information summarization algorithm.
The implementation process of step S101 may specifically include: and generating an MD5 value by adopting an MD5 information abstract algorithm for the feature combinations of all features to be evaluated of the current feature iteration, and taking the MD5 value as the name of the log file of the current feature iteration.
The MD5 Message Digest Algorithm (MD 5 Message-Digest Algorithm) is a widely used cryptographic hash function that generates a 128-bit (16-byte) hash value (hash value) to ensure the integrity of the Message transmission.
Of course, in practical applications, other methods, such as a hash algorithm, may also be used to generate the iterative model identifier, which is determined according to practical situations, and the present invention is not limited herein.
Wherein the combination of characteristics includes: model features, evaluation algorithms, and model parameters.
The invention determines an iterative model identification for the training task, the iterative model identification is an MD5 value generated based on the feature combination of all the features to be evaluated, the MD5 value is used as the unique identification of the current feature iteration, and the iterative model identification before all the features to be evaluated change can be ensured to be consistent.
Specifically, the features of all the features to be evaluated are combined and spliced into a character string, an MD5 value is generated for the character string by adopting an MD5 information abstract algorithm, and the MD5 value is used as a log file name, so that log records can be conveniently found based on the MD5 value before all the features to be evaluated are changed.
For example, assume that there are three features to be evaluated, which are: age, family _ dep and deployed _ time, combining and splicing the three features to be evaluated into a character string "family _ predicted _ time", and generating an MD5 value by adopting an MD5 information summarization algorithm for the character string as follows: bccabac 92B7a7138F8146EF08606a67EB. This ensures that a 32-bit string, i.e., the MD5 value, can be converted no matter how long the string of features to be evaluated is.
Step S102, judging whether a log file with the same log file name as the iterative model identification exists or not, recording the log file as a target log file, and if so, executing step S103;
after the single model training is finished, the iterative model identification obtained by calculation in the single model training and the model evaluation score obtained by training are stored in the log file with the iterative model identification as the name of the log file.
Therefore, an iterative model identifier obtained by calculation during single model training and a model evaluation score obtained by training are recorded in the log file, and the iterative model identifier is obtained based on a feature combination of features to be evaluated during single model training.
It can be understood that before the feature iterative training, a plurality of log files may have been generated, and when a log file identical to the iterative model identifier generated by the feature iterative training is searched, the log file identical to the iterative model identifier generated by the feature iterative training is marked as a target log file.
It should be noted that, in the feature screening process, multiple feature iterative trainings need to be performed on all features to be evaluated, and a log file is generated after each feature iterative training is completed, so as to distinguish each log file, in the first feature iterative training, the suffix of "iterative model identification" and "01" can be used as the name of the log file, and the log file generated in the first feature iterative training process is named; and during the second feature iterative training, naming the log file generated in the second feature iterative training process by using the suffix of 'iterative model identification' plus '02' as the name of the log file and repeating the steps.
And step S103, analyzing the target log file, and acquiring the current model evaluation score of the current characteristic iteration from the analyzed target log file.
In summary, the feature processing method disclosed in the present invention generates an iteration model identifier uniquely corresponding to a feature combination based on the feature combination of all features to be evaluated of the current feature iteration, uses the iteration model identifier as a log file name of the current feature iteration, analyzes a target log file when a log file having the same log file name as the iteration model identifier, that is, the target log file, is found, and obtains a current model evaluation score of the current feature iteration from the analyzed target log file. Because the iteration model identification obtained by calculation during single model training and the model evaluation score obtained by training are both recorded in the log file taking the iteration model identification as the name of the log file, when the characteristic iteration process is terminated due to uncertain factors, the corresponding model evaluation score can be obtained from the log file taking the iteration model identification as the name of the log file by calculating the iteration model identification when the characteristic iteration is terminated, thereby reducing the waste of repeated model training time and improving the characteristic processing efficiency.
In order to further optimize the above embodiment, referring to fig. 2, a flowchart of a feature processing method according to another embodiment of the present invention may further include, after step S102, if the determination in step S102 is negative, the step:
and S104, performing model training on all the features to be evaluated to obtain the current model evaluation score of the current feature iteration, and storing the iteration model identification and the current model evaluation score into a log file with the iteration model identification as the name of the log file in a corresponding relationship mode.
In practical application, feature iteration information can be generated according to features to be evaluated, an evaluation algorithm and model parameters, a feature iteration process is executed based on the feature iteration information, and model training is performed on all the features to be evaluated.
The MD5 value obtained by the feature iterative computation and the current model evaluation score obtained by training may be combined into a key: the value is recorded in the form of a log file.
In order to further optimize the foregoing embodiment, referring to fig. 3, a flowchart of a feature processing method disclosed in another embodiment of the present invention may further include, on the basis of the embodiment shown in fig. 2, after obtaining the current model evaluation score of the current feature iteration, that is, after step S103 and step S104, the steps of:
step S105, judging whether the current feature iteration is the last feature iteration of all the feature iterations, if not, returning to execute step S101, and if so, executing step S106;
when feature screening is performed, multiple feature iteration processes are usually required to be performed, and only after all feature iteration processes are finished, an optimal feature combination can be screened out.
Therefore, after each feature iteration is finished, whether the feature iteration is the last feature iteration of all the feature iterations needs to be judged, if not, the next feature iteration is continuously executed, and if so, the subsequent feature screening operation is continuously executed.
And S106, finding the best feature combination obtained by screening the feature combination with the highest model evaluation score from all the model evaluation scores generated in the iterative process.
In summary, the feature processing method disclosed in the present invention generates an iteration model identifier uniquely corresponding to a feature combination based on the feature combination of all features to be evaluated of the current feature iteration, uses the iteration model identifier as a log file name of the current feature iteration, analyzes a target log file when a log file having the same log file name as the iteration model identifier, that is, the target log file, is found, and obtains a current model evaluation score of the current feature iteration from the analyzed target log file. And when the target log file is not found, model training is continuously carried out on all the features to be evaluated to obtain the current model evaluation score of the feature iteration, and the iteration model identification and the current model evaluation score are stored into the log file with the iteration model identification as the name of the log file in a corresponding relationship mode so as to be convenient for subsequent use. Because the iteration model identification obtained by calculation during single model training and the model evaluation score obtained by training are recorded in the log file taking the iteration model identification as the log file name, when the characteristic iteration process is terminated due to uncertain factors, the corresponding model evaluation score can be obtained from the log file taking the iteration model identification as the log file name by calculating the iteration model identification when the characteristic iteration is terminated, thereby reducing the waste of repeated training time of the model and improving the characteristic processing efficiency. Meanwhile, the invention does not need additional processing subsequently, and the optimal feature combination can be screened out after all the feature iterations are finished.
Corresponding to the embodiment of the method, the invention also discloses a characteristic processing device.
Referring to fig. 4, a schematic structural diagram of a feature processing apparatus according to an embodiment of the present invention includes:
an identifier generating unit 201, configured to generate an iterative model identifier uniquely corresponding to a feature combination based on the feature combination of all features to be evaluated of a current feature iteration, and use the iterative model identifier as a log file name of the current feature iteration;
the iterative model identifiers are used for distinguishing different iterative steps, and the iterative model identifiers generated by the same iterative step are the same.
The characteristic combination comprises: model features, evaluation algorithms, and model parameters.
Optionally, the iterative model identifier may be MD5 obtained by using an MD5 information summarization algorithm.
Therefore, the identifier generating unit 201 may specifically be configured to:
and generating an MD5 value by adopting an MD5 information abstract algorithm for the feature combination, and taking the MD5 value as the name of the log file of the current feature iteration.
Specifically, the features of all the features to be evaluated are combined and spliced into a character string, an MD5 value is generated for the character string by adopting an MD5 information abstract algorithm, and the MD5 value is used as a log file name, so that log records can be conveniently found based on the MD5 value before all the features to be evaluated are changed.
A first judging unit 202, configured to judge whether a log file with a log file name that is the same as the iterative model identifier exists, and record the log file as a target log file, where an iterative model identifier obtained through calculation during single model training and a model evaluation score obtained through training are recorded in the log file;
after the single model training is finished, the iterative model identification obtained by calculation in the single model training and the model evaluation score obtained by training are stored in the log file with the iterative model identification as the name of the log file.
Therefore, an iterative model identifier obtained by calculation during single model training and a model evaluation score obtained by training are recorded in the log file, and the iterative model identifier is obtained based on a feature combination of features to be evaluated during single model training.
It can be understood that a plurality of log files may have been generated before the present feature iteration training, and when a log file having the same identification as the iteration model generated by the present feature iteration is searched, the log file having the same name as the iteration model generated by the present feature iteration is recorded as a target log file.
It should be noted that, in the feature screening process, feature iterative training needs to be performed on all features to be evaluated for multiple times, and a log file is generated after each feature iterative training is completed, so as to facilitate distinguishing of each log file, during the first feature iterative training, the name of the log file can be given by adding an "01" suffix to an "iterative model identifier", and the log file generated in the first feature iterative training process is named; and during the second feature iterative training, the name of the log file can be given by adding a suffix of 'iteration model identification' and '02', the log file generated in the second feature iterative training process is named, and the like.
An analyzing unit 203, configured to, if the first determining unit 202 determines that the current model evaluation score is smaller than the threshold value, analyze the target log file, and obtain a current model evaluation score of the current feature iteration from the analyzed target log file.
In summary, the feature processing apparatus disclosed in the present invention generates an iteration model identifier uniquely corresponding to a feature combination based on the feature combination of all features to be evaluated of the current feature iteration, uses the iteration model identifier as a log file name of the current feature iteration, analyzes a target log file when a log file having the same log file name as the iteration model identifier, that is, the target log file, is found, and obtains a current model evaluation score of the current feature iteration from the analyzed target log file. Because the iteration model identification obtained by calculation during single model training and the model evaluation score obtained by training are both recorded in the log file taking the iteration model identification as the name of the log file, when the characteristic iteration process is terminated due to uncertain factors, the corresponding model evaluation score can be obtained from the log file taking the iteration model identification as the name of the log file by calculating the iteration model identification when the characteristic iteration is terminated, thereby reducing the waste of repeated model training time and improving the characteristic processing efficiency.
In order to further optimize the above embodiment, referring to fig. 5, a schematic structural diagram of a feature processing apparatus according to another embodiment of the present invention may further include, on the basis of the embodiment shown in fig. 4:
a training unit 204, configured to perform model training on all the features to be evaluated under the condition that the first determining unit 202 determines that the features are not evaluated, obtain a current model evaluation score of the current feature iteration, and store the iteration model identifier and the current model evaluation score in a log file with the iteration model identifier as a log file name in a form of a corresponding relationship.
In practical application, feature iteration information can be generated according to features to be evaluated, an evaluation algorithm and model parameters, a feature iteration process is executed based on the feature iteration information, and model training is performed on all the features to be evaluated.
The MD5 value obtained by the iterative computation of the feature and the current model evaluation score obtained by training may be combined into a key: the value is recorded in the form of a log file.
In order to further optimize the above embodiment, referring to fig. 6, a schematic structural diagram of a feature processing apparatus according to another embodiment of the present invention may further include, on the basis of the embodiment shown in fig. 5:
a second determining unit 205, configured to determine whether the current feature iteration is the last feature iteration of all feature iterations after the analyzing unit 203 or the training unit 204 obtains the current model evaluation score;
when feature screening is performed, multiple feature iteration processes are usually required to be performed, and only after all the feature iteration processes are finished, the optimal feature combination can be screened out.
Therefore, after each feature iteration is finished, whether the feature iteration is the last feature iteration of all the feature iterations needs to be judged, if not, the next feature iteration is continuously executed, and if so, the subsequent feature screening operation is continuously executed.
A searching unit 206, configured to search, when the second determining unit 205 determines that the feature combination is the best feature combination obtained by screening, the feature combination with the highest model evaluation score from among the model evaluation scores generated in all the iterative processes.
If the second determination unit 205 determines that the result is negative, the process returns to the execution identifier generation unit 201.
In summary, the feature processing apparatus disclosed in the present invention generates an iteration model identifier uniquely corresponding to a feature combination based on the feature combination of all features to be evaluated of the current feature iteration, uses the iteration model identifier as a log file name of the current feature iteration, analyzes a target log file when a log file having the same log file name as the iteration model identifier, that is, the target log file, is found, and obtains a current model evaluation score of the current feature iteration from the analyzed target log file. And when the target log file is not found, model training is continuously carried out on all the features to be evaluated to obtain the current model evaluation score of the feature iteration, and the iteration model identification and the current model evaluation score are stored into the log file with the iteration model identification as the name of the log file in a corresponding relationship mode so as to be convenient for subsequent use. Because the iteration model identification obtained by calculation during single model training and the model evaluation score obtained by training are both recorded in the log file taking the iteration model identification as the name of the log file, when the characteristic iteration process is terminated due to uncertain factors, the corresponding model evaluation score can be obtained from the log file taking the iteration model identification as the name of the log file by calculating the iteration model identification when the characteristic iteration is terminated, thereby reducing the waste of repeated model training time and improving the characteristic processing efficiency. Meanwhile, the invention does not need additional processing subsequently, and the optimal feature combination can be screened out after all the feature iterations are finished.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the use of the phrase "comprising a. -. Said" to define an element does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (8)

1. A feature processing method, comprising:
generating an iteration model identifier uniquely corresponding to the feature combination based on the feature combination of all features to be evaluated of the current feature iteration, and taking the iteration model identifier as the name of a log file of the current feature iteration;
judging whether a log file with the same log file name as the iterative model identification exists, wherein the log file records the iterative model identification obtained by calculation in single model training and the model evaluation score obtained by training;
if so, recording the log file with the same log file name as the iterative model identification as a target log file, analyzing the target log file, and acquiring the current model evaluation score of the current characteristic iteration from the analyzed target log file.
2. The feature processing method according to claim 1, wherein when the iterative model identifier is an MD5 value, the generating of an iterative model identifier uniquely corresponding to the feature combination based on the feature combination of all features to be evaluated of the current feature iteration and taking the iterative model identifier as a log file name of the current feature iteration specifically includes:
and generating an MD5 value by adopting an MD5 information abstract algorithm for the feature combination, and taking the MD5 value as the name of the log file of the current feature iteration.
3. The feature processing method according to claim 1, further comprising:
if not, performing model training on all the features to be evaluated to obtain the current model evaluation score of the current feature iteration, and storing the iteration model identification and the current model evaluation score into a log file taking the iteration model identification as the name of the log file in a corresponding relation mode.
4. The feature processing method according to claim 3, further comprising, after obtaining the current model evaluation score:
judging whether the current characteristic iteration is the last characteristic iteration of all the characteristic iterations;
if so, finding the feature combination with the highest model evaluation score from all the model evaluation scores generated in the iterative process as the best feature combination obtained by screening.
5. A feature processing apparatus, characterized by comprising:
the identification generation unit is used for generating an iteration model identification uniquely corresponding to the characteristic combination based on the characteristic combination of all the characteristics to be evaluated of the current characteristic iteration, and taking the iteration model identification as the name of the log file of the current characteristic iteration;
the first judging unit is used for judging whether a log file with the same log file name as the iterative model identification exists, wherein the log file records the iterative model identification obtained by calculation in single model training and the model evaluation score obtained by training;
and the analysis unit is used for recording the log file with the same log file name as the iterative model identifier as a target log file, analyzing the target log file and acquiring the current model evaluation score of the current characteristic iteration from the analyzed target log file under the condition that the first judgment unit judges that the log file name is the same as the iterative model identifier.
6. The feature processing apparatus according to claim 5, wherein the identifier generating unit is specifically configured to:
and generating an MD5 value by adopting an MD5 information abstract algorithm for the feature combination, and taking the MD5 value as the name of the log file of the current feature iteration.
7. The feature processing apparatus according to claim 5, characterized by further comprising:
and the training unit is used for carrying out model training on all the features to be evaluated under the condition that the first judgment unit judges that the features are not the features to be evaluated, obtaining the current model evaluation score of the current feature iteration, and storing the iteration model identification and the current model evaluation score into a log file with the iteration model identification as the name of the log file in a corresponding relationship mode.
8. The feature processing apparatus according to claim 7, characterized by further comprising:
a second judging unit, configured to judge whether the current feature iteration is the last feature iteration of all feature iterations after the analyzing unit or the training unit obtains the current model evaluation score;
and the searching unit is used for searching the feature combination with the highest model evaluation score as the best feature combination obtained by screening from all the model evaluation scores generated in all the iterative processes under the condition that the second judging unit judges that the feature combination is positive.
CN202010372184.XA 2020-05-06 2020-05-06 Feature processing method and device Active CN111563067B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010372184.XA CN111563067B (en) 2020-05-06 2020-05-06 Feature processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010372184.XA CN111563067B (en) 2020-05-06 2020-05-06 Feature processing method and device

Publications (2)

Publication Number Publication Date
CN111563067A CN111563067A (en) 2020-08-21
CN111563067B true CN111563067B (en) 2023-04-14

Family

ID=72070811

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010372184.XA Active CN111563067B (en) 2020-05-06 2020-05-06 Feature processing method and device

Country Status (1)

Country Link
CN (1) CN111563067B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105975604A (en) * 2016-05-12 2016-09-28 清华大学 Distribution iterative data processing program abnormity detection and diagnosis method
CN108537289A (en) * 2018-04-24 2018-09-14 百度在线网络技术(北京)有限公司 Training method, device and the storage medium of data identification model
CN108881283A (en) * 2018-07-13 2018-11-23 杭州安恒信息技术股份有限公司 Assess model training method, device and the storage medium of network attack
CN109711555A (en) * 2018-12-21 2019-05-03 北京瀚海星云科技有限公司 A kind of method and system of predetermined depth learning model single-wheel iteration time
CN110298379A (en) * 2019-05-23 2019-10-01 中国平安人寿保险股份有限公司 Assessment models selection method, device, computer equipment and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2672394C1 (en) * 2017-07-26 2018-11-14 Общество С Ограниченной Ответственностью "Яндекс" Methods and systems for evaluation of training objects through a machine training algorithm
EP3544236B1 (en) * 2018-03-21 2022-03-09 Telefonica, S.A. Method and system for training and validating machine learning algorithms in data network environments

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105975604A (en) * 2016-05-12 2016-09-28 清华大学 Distribution iterative data processing program abnormity detection and diagnosis method
CN108537289A (en) * 2018-04-24 2018-09-14 百度在线网络技术(北京)有限公司 Training method, device and the storage medium of data identification model
CN108881283A (en) * 2018-07-13 2018-11-23 杭州安恒信息技术股份有限公司 Assess model training method, device and the storage medium of network attack
CN109711555A (en) * 2018-12-21 2019-05-03 北京瀚海星云科技有限公司 A kind of method and system of predetermined depth learning model single-wheel iteration time
CN110298379A (en) * 2019-05-23 2019-10-01 中国平安人寿保险股份有限公司 Assessment models selection method, device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN111563067A (en) 2020-08-21

Similar Documents

Publication Publication Date Title
CN111340242B (en) Model joint training method and device for protecting privacy
JP5150266B2 (en) Automatic identification of repeated material in audio signals
JP5460887B2 (en) Classification rule generation device and classification rule generation program
JP2011516989A (en) Search result ranking using edit distance and document information
CN106445643B (en) It clones, the method and apparatus of upgrading virtual machine
US20210263979A1 (en) Method, system and device for identifying crawler data
CN109063433B (en) False user identification method and device and readable storage medium
WO2020063524A1 (en) Method and system for determining legal instrument
CN112328499A (en) Test data generation method, device, equipment and medium
CN115982053A (en) Method, device and application for detecting software source code defects
CN106354587A (en) Mirror image server and method for exporting mirror image files of virtual machine
JP2017027495A (en) Verification device, classification system, verification method, classification method, and computer program
CN111563067B (en) Feature processing method and device
CN112905370A (en) Topological graph generation method, anomaly detection method, device, equipment and storage medium
US20180309854A1 (en) Protocol model generator and modeling method thereof
Brunelle et al. Archiving deferred representations using a two-tiered crawling approach
CN113255742A (en) Policy matching degree calculation method and system, computer equipment and storage medium
CN112698861A (en) Source code clone identification method and system
CN112437022A (en) Network flow identification method, equipment and computer storage medium
CN116401229A (en) Database data verification method, device and equipment
JP2020525949A (en) Media search method and device
CN111198818B (en) Information acquisition method and device
CN113885789A (en) Method, system, device and medium for verifying data consistency after metadata repair
CN112052245B (en) Method and device for judging attack behavior in network security training
JP7456289B2 (en) Judgment program, judgment method, and information processing device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant