CN112052915B - Data training method, device, equipment and storage medium - Google Patents

Data training method, device, equipment and storage medium Download PDF

Info

Publication number
CN112052915B
CN112052915B CN202011055438.1A CN202011055438A CN112052915B CN 112052915 B CN112052915 B CN 112052915B CN 202011055438 A CN202011055438 A CN 202011055438A CN 112052915 B CN112052915 B CN 112052915B
Authority
CN
China
Prior art keywords
sample data
data
negative sample
positive sample
positive
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011055438.1A
Other languages
Chinese (zh)
Other versions
CN112052915A (en
Inventor
万明霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank of China Ltd
Original Assignee
Bank of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank of China Ltd filed Critical Bank of China Ltd
Priority to CN202011055438.1A priority Critical patent/CN112052915B/en
Publication of CN112052915A publication Critical patent/CN112052915A/en
Application granted granted Critical
Publication of CN112052915B publication Critical patent/CN112052915B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Electrically Operated Instructional Devices (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a data training method, a device, equipment and a storage medium, which acquire sample data in an original training data set; preprocessing to obtain positive and negative sample data; traversing all column features contained in positive and negative sample data respectively; each column of characteristics is randomly disturbed and recombined according to all columns of characteristics contained in the positive and negative sample data, so that new positive and negative sample data are obtained; adding the new training data set into the original training data set to obtain a new training data set; and used for model training. In the method, the N features are mutually independent and are subjected to normal distribution through random scrambling and recombination of the features in each sample data, and based on the processing, the non-image and non-voice data can be subjected to data enhancement, so that the data set of the data can be effectively expanded, the phenomenon of model overfitting can be effectively improved when the data is used for data training, and the accuracy of model prediction is improved.

Description

Data training method, device, equipment and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a data training method, apparatus, device, and storage medium.
Background
At present, in the process of training a model by using training sample data, if the size of the adopted training sample data is smaller, the fitting phenomenon is easy to occur, namely the training sample data is excessively depended on in the process of training the model, so that the accuracy of a model prediction result is adversely affected.
For image data and voice data, data enhancement means such as overturn, rotation and Gaussian noise are generally adopted to amplify the data scale of the training sample, so as to improve the overfitting phenomenon in the model training process and improve the accuracy of the model prediction result; however, for non-image data and non-speech data, the sample data size cannot be amplified by the data enhancement means, which further leads to problems of easy overfitting phenomenon and inaccurate prediction result in the process of model training by using the non-image data and the non-speech data.
Disclosure of Invention
In view of the above, embodiments of the present invention provide a data training method, apparatus, device, and storage medium, so as to achieve the purpose of amplifying a sample data size by a data enhancement means in a model training process using non-image data and non-speech data, thereby improving a phenomenon of model overfitting, and improving accuracy of model prediction.
In order to achieve the above object, the embodiment of the present invention provides the following technical solutions:
in one aspect, an embodiment of the present invention provides a data training method, where the method includes:
acquiring sample data in an original training data set;
preprocessing the sample data to obtain positive sample data and negative sample data;
traversing all column features contained in the positive sample data and the negative sample data for the positive sample data and the negative sample data respectively;
each column of characteristics in all columns of characteristics is randomly disordered for all columns of characteristics contained in the positive sample data and the negative sample data respectively, and the characteristics are recombined to obtain new positive sample data and new negative sample data;
adding the new positive sample data and the new negative sample data to the original training data set to obtain a new training data set;
and performing model training by using the new training data set.
Optionally, traversing all column features contained in the positive sample data and the negative sample data for the positive sample data and the negative sample data respectively includes:
traversing all column features contained in the positive sample data of a first preset proportion and the negative sample data of a second preset proportion respectively for the positive sample data of the first preset proportion and the negative sample data of the second preset proportion;
the first preset proportion indicates the proportion of the number of positive sample data used for traversing to the number of all positive sample data, and the second preset proportion indicates the proportion of the number of negative sample data used for traversing to the number of all negative sample data.
Optionally, traversing all column features contained in the positive sample data and the negative sample data for the positive sample data and the negative sample data respectively includes:
traversing all column features contained in the positive sample data and the negative sample data meeting a third preset proportional relation according to the positive sample data and the negative sample data meeting the third preset proportional relation respectively;
wherein the third preset proportion indicates a proportion between the number of positive sample data for traversal and the number of negative sample data for traversal.
The traversing all column features contained in the positive sample data and the negative sample data for the positive sample data and the negative sample data, respectively, includes:
traversing all column features contained in all positive sample data and all negative sample data for all positive sample data and all negative sample data, respectively.
In another aspect, an embodiment of the present invention provides a data training apparatus, including:
the acquisition module is used for acquiring sample data in the original training data set;
the preprocessing module is used for preprocessing the sample data to obtain positive sample data and negative sample data;
a traversing feature module, configured to traverse all column features contained in the positive sample data and the negative sample data with respect to the positive sample data and the negative sample data, respectively;
the processing module is used for randomly scrambling each column of characteristics of all columns of characteristics aiming at all columns of characteristics contained in the positive sample data and the negative sample data respectively and recombining the column of characteristics to obtain new positive sample data and new negative sample data;
an adding module, configured to add the new positive sample data and the new negative sample data to the original training data set, to obtain a new training data set;
and the training module is used for carrying out model training by utilizing the new training data set.
Optionally, the traversal feature module is specifically configured to traverse all column features contained in the positive sample data of the first preset proportion and the negative sample data of the second preset proportion, respectively, for the positive sample data of the first preset proportion and the negative sample data of the second preset proportion;
the first preset proportion indicates the proportion of the number of positive sample data used for traversing to the number of all positive sample data, and the second preset proportion indicates the proportion of the number of negative sample data used for traversing to the number of all negative sample data.
Optionally, the traversal feature module is specifically configured to traverse all column features included in the positive sample data and the negative sample data that satisfy the third preset proportional relationship, with respect to the positive sample data and the negative sample data that satisfy the third preset proportional relationship, respectively;
wherein the third preset proportion indicates a proportion between the number of positive sample data for traversal and the number of negative sample data for traversal.
Optionally, the traversing feature module is specifically configured to traverse all column features contained in all positive sample data and all negative sample data with respect to all positive sample data and all negative sample data, respectively.
In another aspect, an embodiment of the present invention provides a data training apparatus, including a processor and a memory;
the memory is used for storing a computer program;
the processor is configured to implement the method when invoking and executing the computer program stored in the memory.
In another aspect, embodiments of the present invention provide a storage medium having stored therein computer-executable instructions that, when loaded and executed by a processor, implement the method.
Based on the data training method, device, equipment and storage medium provided by the embodiment of the invention, sample data in an original training data set are obtained; preprocessing sample data to obtain positive sample data and negative sample data; traversing all column features contained in the positive sample data and the negative sample data respectively for the positive sample data and the negative sample data; each column of characteristics in all columns of characteristics is randomly disturbed and recombined according to all columns of characteristics contained in the positive sample data and the negative sample data respectively to obtain new positive sample data and new negative sample data; adding the new positive sample data and the new negative sample data into the original training data set to obtain a new training data set; model training is performed using the new training dataset. In the scheme provided by the embodiment of the invention, the N features are mutually independent and are subjected to normal distribution by randomly scrambling and recombining the features in each sample data, and the non-image and non-voice data can be subjected to data enhancement based on the processing, so that the data set of the data can be effectively expanded, the phenomenon of model overfitting can be effectively improved when the data is used for data training, and the accuracy of model prediction is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of a data training method according to an embodiment of the present invention;
FIG. 2 is a block diagram of a data training device according to an embodiment of the present invention;
fig. 3 is a block diagram of a data training device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In this application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
According to the background technology, in the process of model training of non-image data and non-voice data, the sample data size cannot be amplified by a data enhancement means, so that the problems of over-fitting phenomenon and inaccurate prediction result, which are easy to occur in the process of model training by using the non-image data and the non-voice data, are caused.
Therefore, the embodiment of the invention provides a data training method, a device, equipment and a storage medium, so that the purposes of amplifying the sample data scale by a data enhancement means, improving the phenomenon of model overfitting and improving the accuracy of model prediction in the process of model training by using non-image data and non-voice data are realized.
Referring to fig. 1, a flow chart of a data training method according to an embodiment of the present invention is shown. The method comprises the following steps:
s101: sample data in the original training data set is obtained.
In the process of specifically implementing S101, all sample data in the original training data set may be acquired, or a portion of sample data in the original training data set may be acquired.
S102: and preprocessing the sample data to obtain positive sample data and negative sample data.
In the process of implementing S102, the following preprocessing may be performed based on the sample data obtained by executing S101:
first, the sample data obtained in S101 is subjected to screening processing, and abnormal data in the sample data is removed.
Secondly, carrying out standardization processing on the sample data, scaling the attribute of the sample data to a certain appointed range, converting the sample data into data with zero mean and one variance, and enabling each feature in the sample data to be subjected to Gaussian normal distribution.
And finally, carrying out feature coding processing on the sample data, converting the numerical attribute in the sample data into the attribute of the Boolean value, and setting a threshold value as a separation point for dividing the attribute value into 0 and 1. Alternatively, in the implementation process, sample data with an attribute value of 1 may be referred to as positive sample data, and sample data with an attribute value of 0 may be referred to as negative sample data.
S103: all column features contained in the positive and negative sample data are traversed for the positive and negative sample data, respectively.
In the process of implementing S103 in particular, there are a variety of implementations.
Optionally, the first scheme is: traversing all column features contained in the positive sample data of the first preset proportion and the negative sample data of the second preset proportion respectively for the positive sample data of the first preset proportion and the negative sample data of the second preset proportion.
The first preset proportion indicates the proportion of the number of positive sample data used for traversing to the number of all positive sample data, and the second preset proportion indicates the proportion of the number of negative sample data used for traversing to the number of all negative sample data.
It should be noted that the first preset proportion and the second preset proportion may take the same value or may take different values. Of course, the first preset ratio may be a value greater than the second preset ratio, or may be a value less than the second preset ratio, which is not limited herein.
The second scheme is as follows: and traversing all column features contained in the positive sample data and the negative sample data which meet the third preset proportional relation according to the positive sample data and the negative sample data which meet the third preset proportional relation respectively.
Wherein the third preset proportion indicates a proportion between the number of positive sample data for traversal and the number of negative sample data for traversal.
The third preset proportion may be a proportion value obtained by dividing the number of positive sample data used for traversal by the number of negative sample data used for traversal, or may be a proportion value obtained by dividing the number of negative sample data used for traversal by the number of positive sample data used for traversal.
The third scheme is as follows: all positive and negative sample data are traversed for all column features contained therein, respectively.
It should be further noted that, in the above three schemes, a specific scheme may be selected according to the actual scene application requirement, and in the implementation, the preset proportion may also be set according to the actual scene application requirement, for example, when the total positive sample data is smaller than the total negative sample data, the third preset proportion obtained by dividing the number of positive sample data used for traversal by the number of negative sample data used for traversal may be set to a larger value, which is, of course, only introduced by way of example.
S104: and randomly scrambling each column of features in all columns of features for all columns of features contained in the positive sample data and the negative sample data respectively, and recombining to obtain new positive sample data and new negative sample data.
In the specific implementation S104, each column of features of all columns of features included in the positive sample data is randomly scrambled and recombined to obtain new positive sample data, and each column of features of all columns of features included in the negative sample data is randomly scrambled and recombined to obtain new negative sample data.
In a specific implementation, random scrambling may be performed using the shuffle function of python, although other approaches may be used.
It should be noted that, in the random scrambling process, the characteristics of the current column to be scrambled are randomly scrambled, and the characteristics after random scrambling are still in the current column.
To facilitate an understanding of the foregoing regarding the random scrambling feature, the following is illustrated, although it is intended to be illustrative only.
For example, the positive sample data includes 3 columns of features, the first column of features, the second column of features and the third column of features are randomly scrambled at the same time or at different times (for example, sequentially performed in the order of the first column, the second column and the third column), the features of the first column which are randomly scrambled remain in the first column, the features of the second column which are randomly scrambled remain in the second column, and the features of the third column which are randomly scrambled remain in the third column.
It should be noted that, each column of features of all columns of features contained in the sample data is randomly scrambled, so that features contained in the randomly scrambled sample data are mutually independent, and subsequent amplification of the sample data scale is conveniently realized by a data enhancement means.
S105: and adding the new positive sample data and the new negative sample data to the original training data set to obtain a new training data set.
In the process of implementing S102, new positive sample data and new negative sample data may be added to the original training data set separately, or may be added to the original training data set after being mixed.
S106: model training is performed using the new training dataset.
In the scheme provided by the embodiment of the invention, the N features are mutually independent and are subjected to normal distribution by randomly scrambling and recombining the features in each sample data, and the non-image and non-voice data can be subjected to data enhancement based on the processing, so that the data set of the data can be effectively expanded, the phenomenon of model overfitting can be effectively improved when the data is used for data training, and the accuracy of model prediction is improved.
Based on the data training method disclosed by the embodiment of the invention, correspondingly, the embodiment of the invention also discloses a data training device. Referring to fig. 2, a block diagram of a data training device according to an embodiment of the present invention is shown.
The data training device comprises: an acquisition module 201, a preprocessing module 202, a traversal feature module 203, a processing module 204, an addition module 205, and a training module 206.
The acquisition module 201 is configured to: sample data in the original training data set is obtained.
The preprocessing module 202 is configured to: and preprocessing the sample data to obtain positive sample data and negative sample data.
The traversal feature module 203 is configured to: all column features contained in the positive and negative sample data are traversed for the positive and negative sample data, respectively.
The processing module 204 is configured to: and randomly scrambling each column of features in all columns of features for all columns of features contained in the positive sample data and the negative sample data respectively, and recombining to obtain new positive sample data and new negative sample data.
The adding module 205 is configured to: and adding the new positive sample data and the new negative sample data to the original training data set to obtain a new training data set.
The training module 206 is configured to: model training is performed using the new training dataset.
Optionally, the traversal feature module 203 is specifically configured to: traversing all column features contained in the positive sample data of the first preset proportion and the negative sample data of the second preset proportion respectively for the positive sample data of the first preset proportion and the negative sample data of the second preset proportion.
The first preset proportion indicates the proportion of the number of positive sample data used for traversing to the number of all positive sample data, and the second preset proportion indicates the proportion of the number of negative sample data used for traversing to the number of all negative sample data.
Alternatively, the traversal feature module 203 is specifically configured to: and traversing all column features contained in the positive sample data and the negative sample data which meet the third preset proportional relation according to the positive sample data and the negative sample data which meet the third preset proportional relation respectively.
Wherein the third preset proportion indicates a proportion between the number of positive sample data for traversal and the number of negative sample data for traversal.
Alternatively, the traversal feature module 203 is specifically configured to: all positive and negative sample data are traversed for all column features contained therein, respectively.
The specific implementation principle of each module in the data training device disclosed in the above embodiment of the present invention may refer to corresponding content in the data training method disclosed in the above embodiment of the present invention, and will not be described herein again.
Based on the data training device provided by the embodiment of the invention, the acquisition module acquires sample data in an original training data set; the preprocessing module preprocesses the sample data to obtain positive sample data and negative sample data; the traversal feature module traverses all column features contained in the positive sample data and the negative sample data according to the positive sample data and the negative sample data respectively; the processing module randomly breaks up each column of characteristics of all columns of characteristics according to all columns of characteristics contained in the positive sample data and the negative sample data respectively, and recombines the positive sample data and the negative sample data to obtain new positive sample data and new negative sample data; the adding module adds the new positive sample data and the new negative sample data into the original training data set to obtain a new training data set; the training module performs model training using the new training data set. In the scheme provided by the embodiment of the invention, the N features are mutually independent and are subjected to normal distribution by randomly scrambling and recombining the features in each sample data, and the non-image and non-voice data can be subjected to data enhancement based on the processing, so that the data set of the data can be effectively expanded, the phenomenon of model overfitting can be effectively improved when the data is used for data training, and the accuracy of model prediction is improved.
Based on the data training method and the data training device disclosed by the embodiment of the invention, the embodiment of the invention also discloses data training equipment. Referring to fig. 3, a block diagram of a data training device according to an embodiment of the present invention is shown.
The data training apparatus includes: a processor 301 and a memory 302.
A memory 302 for storing a computer program.
The processor 301 is configured to implement any of the data training methods disclosed above according to the embodiments of the present invention when invoking and executing the computer program stored in the memory 302.
Based on the data training method, the data training device and the data training equipment disclosed by the embodiment of the invention, the embodiment of the invention also discloses a storage medium.
The storage medium has stored therein computer executable instructions. When loaded and executed by a processor, the computer-executable instructions implement any of the data training methods disclosed above in accordance with embodiments of the present invention.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for a system or system embodiment, since it is substantially similar to a method embodiment, the description is relatively simple, with reference to the description of the method embodiment being made in part. The systems and system embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method of data training, the method comprising:
acquiring sample data in an original training data set;
preprocessing the sample data to obtain positive sample data and negative sample data;
traversing all column features contained in the positive sample data and the negative sample data for the positive sample data and the negative sample data respectively;
each column of features in all columns of features are randomly disturbed according to all columns of features contained in the positive sample data and the negative sample data respectively, so that the features contained in the randomly disturbed sample data are mutually independent and recombined to obtain new positive sample data and new negative sample data; during the process of randomly scrambling each column of features in all columns of features, randomly scrambling the current column of features to be scrambled, wherein the randomly scrambled features are still in the current column;
adding the new positive sample data and the new negative sample data to the original training data set to obtain a new training data set;
and performing model training by using the new training data set.
2. The method of claim 1, wherein traversing all column features contained by the positive sample data and the negative sample data for the positive sample data and the negative sample data, respectively, comprises:
traversing all column features contained in the positive sample data of a first preset proportion and the negative sample data of a second preset proportion respectively for the positive sample data of the first preset proportion and the negative sample data of the second preset proportion;
the first preset proportion indicates the proportion of the number of positive sample data used for traversing to the number of all positive sample data, and the second preset proportion indicates the proportion of the number of negative sample data used for traversing to the number of all negative sample data.
3. The method of claim 1, wherein traversing all column features contained by the positive sample data and the negative sample data for the positive sample data and the negative sample data, respectively, comprises:
traversing all column features contained in the positive sample data and the negative sample data meeting a third preset proportional relation according to the positive sample data and the negative sample data meeting the third preset proportional relation respectively;
wherein the third preset proportion indicates a proportion between the number of positive sample data for traversal and the number of negative sample data for traversal.
4. The method of claim 1, wherein traversing all column features contained by the positive sample data and the negative sample data for the positive sample data and the negative sample data, respectively, comprises:
traversing all column features contained in all positive sample data and all negative sample data for all positive sample data and all negative sample data, respectively.
5. A data training apparatus, the apparatus comprising:
the acquisition module is used for acquiring sample data in the original training data set;
the preprocessing module is used for preprocessing the sample data to obtain positive sample data and negative sample data;
a traversing feature module, configured to traverse all column features contained in the positive sample data and the negative sample data with respect to the positive sample data and the negative sample data, respectively;
the processing module is used for randomly scrambling each column of characteristics of all columns of characteristics aiming at all columns of characteristics contained in the positive sample data and the negative sample data respectively so as to enable the characteristics contained in the randomly scrambled sample data to be mutually independent and recombined to obtain new positive sample data and new negative sample data; during the process of randomly scrambling each column of features in all columns of features, randomly scrambling the current column of features to be scrambled, wherein the randomly scrambled features are still in the current column;
an adding module, configured to add the new positive sample data and the new negative sample data to the original training data set, to obtain a new training data set;
and the training module is used for carrying out model training by utilizing the new training data set.
6. The apparatus of claim 5, wherein the device comprises a plurality of sensors,
the traversal feature module is specifically configured to traverse all column features contained in the positive sample data of a first preset proportion and the negative sample data of a second preset proportion for the positive sample data of the first preset proportion and the negative sample data of the second preset proportion respectively;
the first preset proportion indicates the proportion of the number of positive sample data used for traversing to the number of all positive sample data, and the second preset proportion indicates the proportion of the number of negative sample data used for traversing to the number of all negative sample data.
7. The apparatus of claim 5, wherein the device comprises a plurality of sensors,
the traversal feature module is specifically configured to traverse all column features included in the positive sample data and the negative sample data that satisfy a third preset proportional relationship, with respect to the positive sample data and the negative sample data that satisfy the third preset proportional relationship, respectively;
wherein the third preset proportion indicates a proportion between the number of positive sample data for traversal and the number of negative sample data for traversal.
8. The apparatus of claim 5, wherein the device comprises a plurality of sensors,
the traversing feature module is specifically configured to traverse all column features contained in all positive sample data and all negative sample data with respect to all positive sample data and all negative sample data, respectively.
9. A data training device comprising a processor and a memory;
the memory is used for storing a computer program;
the processor being adapted to implement the method of any of claims 1 to 4 when invoking and executing a computer program stored in the memory.
10. A storage medium having stored therein computer executable instructions which, when loaded and executed by a processor, implement the method of any one of claims 1 to 4.
CN202011055438.1A 2020-09-29 2020-09-29 Data training method, device, equipment and storage medium Active CN112052915B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011055438.1A CN112052915B (en) 2020-09-29 2020-09-29 Data training method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011055438.1A CN112052915B (en) 2020-09-29 2020-09-29 Data training method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112052915A CN112052915A (en) 2020-12-08
CN112052915B true CN112052915B (en) 2024-02-13

Family

ID=73605073

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011055438.1A Active CN112052915B (en) 2020-09-29 2020-09-29 Data training method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112052915B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022252079A1 (en) * 2021-05-31 2022-12-08 京东方科技集团股份有限公司 Data processing method and apparatus
CN113762423A (en) * 2021-11-09 2021-12-07 北京世纪好未来教育科技有限公司 Data processing and model training method and device, electronic equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109887541A (en) * 2019-02-15 2019-06-14 张海平 A kind of target point protein matter prediction technique and system in conjunction with small molecule
CN111275491A (en) * 2020-01-21 2020-06-12 深圳前海微众银行股份有限公司 Data processing method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107798390B (en) * 2017-11-22 2023-03-21 创新先进技术有限公司 Training method and device of machine learning model and electronic equipment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109887541A (en) * 2019-02-15 2019-06-14 张海平 A kind of target point protein matter prediction technique and system in conjunction with small molecule
CN111275491A (en) * 2020-01-21 2020-06-12 深圳前海微众银行股份有限公司 Data processing method and device

Also Published As

Publication number Publication date
CN112052915A (en) 2020-12-08

Similar Documents

Publication Publication Date Title
CN112052915B (en) Data training method, device, equipment and storage medium
US20190042982A1 (en) Automatic Multi-Threshold Feature Filtering Method and Apparatus
CN103347009B (en) A kind of information filtering method and device
CN109902617B (en) Picture identification method and device, computer equipment and medium
CN115222630A (en) Image generation method, and training method and device of image denoising model
CN115018954B (en) Image generation method, device, electronic equipment and medium
CN111899759B (en) Method, device, equipment and medium for pre-training and model training of audio data
CN111160110A (en) Method and device for identifying anchor based on face features and voice print features
CN109241739B (en) API-based android malicious program detection method and device and storage medium
CN112381147B (en) Dynamic picture similarity model establishment and similarity calculation method and device
CN111539206B (en) Method, device, equipment and storage medium for determining sensitive information
CN110825947B (en) URL deduplication method, device, equipment and computer readable storage medium
CN115934484B (en) Diffusion model data enhancement-based anomaly detection method, storage medium and apparatus
CN111861931A (en) Model training method, image enhancement method, model training device, image enhancement device, electronic equipment and storage medium
CN111353526A (en) Image matching method and device and related equipment
US20160142456A1 (en) Method and Device for Acquiring Media File
CN110866043A (en) Data preprocessing method and device, storage medium and terminal
CN111402918A (en) Audio processing method, device, equipment and storage medium
CN115982965A (en) Carbon fiber material damage detection method and device for denoising diffusion sample increment learning
CN112487903B (en) Gait data generation method and device based on countermeasure network
CN111444756B (en) Dangerous driving scene identification platform, method and storage medium
CN112669204A (en) Image processing method, and training method and device of image processing model
CN113132306A (en) Threat event processing method and device
CN111046337A (en) Data interval value processing method and device, equipment and storage medium
CN113569727B (en) Method, system, terminal and medium for identifying construction site in remote sensing image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant