CN114254698A - Unbalanced data and image processing method and system and computer equipment - Google Patents

Unbalanced data and image processing method and system and computer equipment Download PDF

Info

Publication number
CN114254698A
CN114254698A CN202111485510.9A CN202111485510A CN114254698A CN 114254698 A CN114254698 A CN 114254698A CN 202111485510 A CN202111485510 A CN 202111485510A CN 114254698 A CN114254698 A CN 114254698A
Authority
CN
China
Prior art keywords
data
data set
samples
unbalanced
attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111485510.9A
Other languages
Chinese (zh)
Other versions
CN114254698B (en
Inventor
戴亚康
钱旭升
周志勇
胡冀苏
姜宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Guoke Medical Technology Development Group Co ltd
Suzhou Institute of Biomedical Engineering and Technology of CAS
Original Assignee
Suzhou Guoke Medical Technology Development Group Co ltd
Suzhou Institute of Biomedical Engineering and Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Guoke Medical Technology Development Group Co ltd, Suzhou Institute of Biomedical Engineering and Technology of CAS filed Critical Suzhou Guoke Medical Technology Development Group Co ltd
Priority to CN202111485510.9A priority Critical patent/CN114254698B/en
Priority claimed from CN202111485510.9A external-priority patent/CN114254698B/en
Publication of CN114254698A publication Critical patent/CN114254698A/en
Application granted granted Critical
Publication of CN114254698B publication Critical patent/CN114254698B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a method, a system and computer equipment for processing unbalanced data and images, which comprises the following steps: 1) preprocessing the unbalanced data set O; 2) determining parameters of the RBF neural network data generation model by using a maximum distribution algorithm based on the Hausdorff distance; 3) constructing an RBF neural network data generation model; 4) generating a sample set S by combining the constructed RBF neural network data generation model with the mvnrnd function; 5) filling the generated sample set S into the original unbalanced data set O to obtain a processed balanced data set Os,OsO ═ os. The unbalanced data and image processing method provided by the invention can process missing values and different types of attributesThe method adaptively learns the intra-class and inter-class distribution of the original unbalanced data, and automatically generates data according to classes to expand a few classes in the original data, thereby effectively improving the unbalance of the data and improving the accuracy of data analysis.

Description

Unbalanced data and image processing method and system and computer equipment
Technical Field
The invention relates to the field of data analysis and processing, in particular to an unbalanced data and image processing method, system and computer equipment.
Background
In the same dataset, the number of samples of one or a part of the classes is small (positive or few classes), while the number of samples of the other or other part of the classes is relatively large (negative or majority classes), and the samples contained in the two parts are far apart in number, and a dataset that meets this condition is called an unbalanced dataset. In an unbalanced data set, the number of minority class samples is small, so that sufficient information cannot be provided for the classifier in classification learning, and the number of majority classes is large, so that sufficient information is provided for the classifier, which results in that the classifier can more easily identify the majority classes in the classification process, and the identification rate of the minority classes is low.
There are many fields in real life that require knowledge modeling analysis for the condition of data imbalance, such as the following fields: medical information assisted diagnosis, mass advertising spam handling, multimedia information retrieval, credit card fraud detection, text information classification, and the like. In many related fields, the identification and classification of minority classes are important, and the meaning of the correct identification of the minority classes to the whole classification learning is far more than that of the correct identification of the majority classes of samples. For example, in medical information-assisted diagnosis, the diagnosis of a doctor can be divided into four cases: normal persons are correctly diagnosed as normal, persons with diseases are correctly diagnosed as diseased, normal persons are misdiagnosed as diseased, and persons with diseases are misdiagnosed as normal. If the doctor misdiagnoses the normal person as a patient in the process, the serious psychological and monetary pressure can be brought to the normal person. However, if a patient is misdiagnosed as a healthy person by the auxiliary medical diagnosis system, it is highly likely that the patient cannot be treated in time. The misdiagnosis of the patient as normal in the four cases is the least common case in reality and can be regarded as a few types, and the other three cases are frequently regarded as a plurality of types. However, most of the existing classification methods have high recognition rate for most classes, but have low recognition rate for few classes, and do not show the true function of the classifier.
The processing method for the unbalanced data mainly comprises the step of carrying out undersampling or oversampling on a sample through a resampling technology so as to adjust the unbalanced degree of a sample set. Common methods for adjusting imbalance data from a few classes of angles are: random oversampling, SMOTE, borderline-SMOTE, and the like. The methods do not well consider the data distribution characteristics of the actual data set, and have certain randomness and blindness, so that the classification effect is influenced.
Therefore, there is a need to provide a more reliable solution.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide an unbalanced data and image processing method, system and computer device for overcoming the above-mentioned shortcomings in the prior art.
In order to solve the technical problems, the invention adopts the technical scheme that: an unbalanced data and image processing method is provided, which comprises the following steps:
1) preprocessing the unbalanced data set O;
2) processing the preprocessed unbalanced data set O by using a maximum distribution algorithm based on the Hausdorff distance, and determining parameters of an RBF neural network data generation model to be constructed; the parameters comprise hidden layer neurons of an RBF neural network data generation model, a category, an output weight and a diagonal distribution matrix corresponding to each hidden layer neuron, and a connection weight between each hidden layer neuron and a corresponding output neuron;
3) constructing an RBF neural network data generation model based on the result of the step 2);
4) generating data by combining the constructed RBF neural network data generation model with the mvnrnd function to obtain a generated sample set S;
5) filling the generated sample set S into the original unbalanced data set O to obtain a processed balanced data set Os,Os=O∪S。
Preferably, the step 1) is specifically:
complementing the missing value of the numerical attribute in the unbalanced data set O by using the mean value of the attribute of the same type of sample; for missing values of ordinal attributes and nominal attributes, completing the missing values by using the value with the highest attribute occurrence frequency of the same type of samples;
after completing data completion, sequentially coding the ordinal attribute and the nominal attribute;
converting image data in the unbalanced data set O into numerical data by adopting a PyRadiomics-based tool kit, adding the numerical data into the data set O, and standardizing all attributes by using a z-score method to obtain a preprocessed data set D;
using the vector LmeanAnd LstdAnd respectively storing the mean value and the standard deviation of each attribute, and storing the sequential coding modes of the ordinal attribute and the nominal attribute.
Preferably, the step 2) specifically includes:
2-1) assume that there are N input samples { x in dataset DnN is 1,2, …, N, each sample has M attributes, each sample belongs to one of C classes, and the number of samples in the C class is Nc,c=1,2,…,C;
2-2) dividing the samples in the data set according to the categories to obtain a data subset D consisting of the samples belonging to the class ccC is 1,2, …, C; initializing, and making the current class index c equal to 0 and the current hidden layer neuron number P equal to 0;
2-3) let c ═ c + 1;
2-4) let P ═ P +1, calculate DcAnd the Hausdorff distance h between other samplesPThe corresponding sample is used as a hidden layer neuron center k newly added in the class cP(ii) a Calculating DcAll samples inkPThe recording distance is less than hPCorresponding subset d of all samplescAnd d iscFrom DcDeleting; with dcNumber of intermediate samples as kPConnection weight w between output neuron and corresponding classP,kPThe connection weight value between the neuron and other output neurons is 0; calculating dcThe variance v of each dimension attribute inmComposition kPCorresponding diagonal distribution matrix
Figure BDA0003396351760000031
2-5) if DcIf the number of the remaining samples is not 0, returning to the step c; otherwise, check if C is equal to C, if C < C, go back to step 2-3), if C ═ C, the algorithm terminates.
Preferably, the step 3) specifically includes:
3-1) determining that an input layer of the RBF neural network data generation model has M input neurons according to M attributes of each sample in the data set D, wherein each neuron corresponds to one attribute;
3-2) determining that an output layer of the RBF neural network data generation model has C output neurons according to C categories of the data set D, wherein each neuron corresponds to one category;
3-3) obtaining P hidden layer neurons k according to the result of the step 2)1,k2,…,kP-1,kPAnd its corresponding class and output weight { w }1,w2,…,wP-1,wPAnd the corresponding P diagonal distribution matrices { V }1,V2,…,VP-1,VPDetermining parameters of P hidden layer neurons { (k)1,V1),(k2,V3),…,(kP-1,VP-1),(kP,VP) And the connection weight between each hidden layer neuron and the corresponding output neuron { w }1,w2,…,wP-1,wP}。
Preferably, the step 4) specifically includes:
4-1) setting the number S of samples to be generated for each categorycC is 1,2, …, C; initializing, making the current hidden layer neuron center index p equal to 0, and generating a sample set
Figure BDA0003396351760000045
Representing an empty set;
4-2) let p ═ p +1, assuming current hidden neuron center kPBelongs to class c, then kPCorresponding to the number of generated samples of
Figure BDA0003396351760000041
4-3) generated sample matrix
Figure BDA0003396351760000042
Wherein each sample belongs to class c; will be provided with
Figure BDA0003396351760000043
Are combined into the generated set of samples S,
Figure BDA0003396351760000044
checking whether P is equal to P, and returning to the step 4-2) if P < P); if P is equal to P, obtaining a complete generated sample set S, and executing the next step;
4-4) mean vector L from all attributes saved during preprocessingmeanAnd standard deviation LstdCarrying out inverse standardization on S; and converting the corresponding numerical value in the S back to the original values of the ordinal attribute and the nominal attribute according to the sequential coding mode of the ordinal attribute and the nominal attribute.
The present invention also provides an unbalanced data and image processing system, which uses the method as described above to process unbalanced data, the system comprising:
the data preprocessing module is used for preprocessing the unbalanced data set O according to the method in the step 1) to obtain a data set D;
the maximum distribution algorithm module is used for determining parameters of the RBF neural network data generation model to be constructed according to the method in the step 2);
the network model building module is used for building an RBF neural network data generation model according to the method in the step 3);
the RBF neural network data generation model is combined with the mvnrnd function, and a new data set S is generated in a self-adaptive mode according to the distribution of the original unbalanced data set by the method in the step 4);
and a data post-processing module for filling the generated sample set S into the original unbalanced data set O to obtain a processed balanced data set Os
The invention also provides a storage medium having stored thereon a computer program which, when executed, is adapted to carry out the method as described above.
The invention also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method as described above when executing the computer program.
The invention has the beneficial effects that: the unbalanced data and image processing method provided by the invention can process missing values and attributes of different types, adaptively learn the intra-class and inter-class distribution of the original unbalanced data, automatically generate data according to classes and expand a few classes in the original data, thereby effectively improving the unbalance of the data and improving the accuracy of data analysis.
Drawings
FIG. 1 is a flow chart of an unbalanced data and image processing method of the present invention;
FIG. 2 is a schematic diagram of the schematic structure of the RBF neural network data generation model of the present invention.
Detailed Description
The present invention is further described in detail below with reference to examples so that those skilled in the art can practice the invention with reference to the description.
It will be understood that terms such as "having," "including," and "comprising," as used herein, do not preclude the presence or addition of one or more other elements or groups thereof.
Example 1
Referring to fig. 1, the unbalanced data and image processing method of the present embodiment includes the following steps:
s1, preprocessing the unbalanced data set O:
complementing the missing value of the numerical attribute in the unbalanced data set O by using the mean value of the attribute of the same type of sample; for missing values of ordinal attributes and nominal attributes, completing the missing values by using the value with the highest attribute occurrence frequency of the same type of samples;
after completing data completion, sequentially coding the ordinal attribute and the nominal attribute;
converting image data in the unbalanced data set O into numerical data by adopting a PyRadiomics-based tool kit, adding the numerical data into the data set O, and standardizing all attributes by using a z-score method to obtain a preprocessed data set D; wherein, the data types in the unbalanced data set O comprise numerical data, image data and the like;
using the vector LmeanAnd LstdAnd respectively storing the mean value and the standard deviation of each attribute, and storing the sequential coding modes of the ordinal attribute and the nominal attribute.
S2, processing the preprocessed unbalanced data set O by using a maximum distribution algorithm based on the Hausdorff distance, and determining parameters of an RBF neural network data generation model to be constructed; the parameters comprise hidden layer neurons of an RBF neural network data generation model, a category, an output weight and a diagonal distribution matrix corresponding to each hidden layer neuron, and a connection weight between each hidden layer neuron and the corresponding output neuron; the method specifically comprises the following steps:
s2-1) assume that there are N input samples { x ] in the data set DnN is 1,2, …, N, each sample has M attributes, each sample belongs to one of C classes, and the number of samples in the C class is Nc,c=1,2,…,C;
S2-2) dividing the samples in the data set according to the belonged categories to obtain a data subset D consisting of samples belonging to the class ccC is 1,2, …, C; initializing the current class index c to 0 and hiding the current class indexThe number P of layer neurons is 0;
s2-3) making c ═ c + 1;
s2-4) let P ═ P +1, calculate DcAnd the Hausdorff distance h between other samplesPThe corresponding sample is used as a hidden layer neuron center k newly added in the class cP(ii) a Calculating DcAll samples in to kPThe recording distance is less than hPCorresponding subset d of all samplescAnd d iscFrom DcDeleting; with dcNumber of intermediate samples as kPConnection weight w between output neuron and corresponding classP,kPThe connection weight value between the neuron and other output neurons is 0; calculating dcThe variance v of each dimension attribute inmComposition kPCorresponding diagonal distribution matrix
Figure BDA0003396351760000061
S2-5) if DcIf the number of the remaining samples is not 0, returning to the step c; otherwise, it is checked whether C is equal to C, and if C < C, it returns to step S2-3), and if C ═ C, the algorithm terminates.
S3, constructing an RBF neural network data generation model based on the result of the step S2), specifically comprising the following steps:
s3-1) determining that an input layer of the RBF neural network data generation model has M input neurons according to M attributes of each sample in the data set D, wherein each neuron corresponds to one attribute;
s3-2) determining that an output layer of the RBF neural network data generation model has C output neurons according to C categories of the data set D, wherein each neuron corresponds to one category;
s3-3) obtaining P hidden layer neurons k according to the result of the step S2)1,k2,…,kP-1,kPAnd its corresponding class and output weight { w }1,w2,…,wP-1,wPAnd the corresponding P diagonal distribution matrices { V }1,V2,…,VP-1,VP}, determination of P hidden layer neuronsParameter { (k)1,V1),(k2,V3),…,(kP-1,VP-1),(kP,VP) And the connection weight between each hidden layer neuron and the corresponding output neuron { w }1,w2,…,wP-1,wP}。
Where, it is assumed that the 1 st and 2 nd hidden layer neurons belong to class 1 and that the P-1 st and P-th hidden layer neurons belong to class C.
The principle structure of the constructed RBF neural network data generation model is shown in FIG. 2.
S4, generating data by combining the constructed RBF neural network data generation model with the mvnrnd function to obtain a generated sample set S, which specifically comprises the following steps:
s4-1) setting the number S of samples to be generated for each categorycC is 1,2, …, C; initializing, making the current hidden layer neuron center index p equal to 0, and generating a sample set
Figure BDA0003396351760000075
Representing an empty set;
s4-2) let p ═ p +1, assuming current hidden neuron center kPBelongs to class c, then kPCorresponding to the number of generated samples of
Figure BDA0003396351760000071
S4-3) generated sample matrix
Figure BDA0003396351760000072
Wherein each sample belongs to class c; will be provided with
Figure BDA0003396351760000073
Are combined into the generated set of samples S,
Figure BDA0003396351760000074
checking whether P is equal to P, and if P < P, returning to step S4-2); if P is equal to P, obtaining a complete generated sample set S, and executing the next step;
s4-4) average value vector L of all attributes stored in preprocessingmeanAnd standard deviation LstdCarrying out inverse standardization on S; and converting the corresponding numerical value in the S back to the original values of the ordinal attribute and the nominal attribute according to the sequential coding mode of the ordinal attribute and the nominal attribute.
S5, filling the generated sample set S into the original unbalanced data set O to obtain a processed balanced data set Os,Os=O∪S。
Example 2
The present embodiment provides an unbalanced data and image processing system, which performs unbalanced data processing by using the method of embodiment 1, and the system includes:
the data preprocessing module is used for preprocessing the unbalanced data set O according to the method in the step 1) to obtain a data set D;
the maximum distribution algorithm module is used for determining parameters of the RBF neural network data generation model to be constructed according to the method in the step 2);
the network model building module is used for building an RBF neural network data generation model according to the method in the step 3);
the RBF neural network data generation model is combined with the mvnrnd function, and a new data set S is generated in a self-adaptive mode according to the distribution of the original unbalanced data set by the method in the step 4);
and a data post-processing module for filling the generated sample set S into the original unbalanced data set O to obtain a processed balanced data set Os
The present embodiment also provides a storage medium having stored thereon a computer program for implementing the method of embodiment 1 when executed.
The present embodiment also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of embodiment 1 when executing the computer program.
While embodiments of the invention have been disclosed above, it is not limited to the applications listed in the description and the embodiments, which are fully applicable in all kinds of fields of application of the invention, and further modifications may readily be effected by those skilled in the art, so that the invention is not limited to the specific details without departing from the general concept defined by the claims and the scope of equivalents.

Claims (8)

1. An unbalanced data and image processing method, comprising the steps of:
1) preprocessing the unbalanced data set O;
2) processing the preprocessed unbalanced data set O by using a maximum distribution algorithm based on the Hausdorff distance, and determining parameters of an RBF neural network data generation model to be constructed; the parameters comprise hidden layer neurons of an RBF neural network data generation model, a category, an output weight and a diagonal distribution matrix corresponding to each hidden layer neuron, and a connection weight between each hidden layer neuron and a corresponding output neuron;
3) constructing an RBF neural network data generation model based on the result of the step 2);
4) generating data by combining the constructed RBF neural network data generation model with the mvnrnd function to obtain a generated sample set S;
5) filling the generated sample set S into the original unbalanced data set O to obtain a processed balanced data set Os,Os=O∪S。
2. The unbalanced data and image processing method according to claim 1, wherein the step 1) is specifically:
complementing the missing value of the numerical attribute in the unbalanced data set O by using the mean value of the attribute of the same type of sample; for missing values of ordinal attributes and nominal attributes, completing the missing values by using the value with the highest attribute occurrence frequency of the same type of samples;
after completing data completion, sequentially coding the ordinal attribute and the nominal attribute;
converting image data in the unbalanced data set O into numerical data by adopting a PyRadiomics-based tool kit, adding the numerical data into the data set O, and standardizing all attributes by using a z-score method to obtain a preprocessed data set D;
using the vector LmeanAnd LstdAnd respectively storing the mean value and the standard deviation of each attribute, and storing the sequential coding modes of the ordinal attribute and the nominal attribute.
3. The unbalanced data and image processing method of claim 2, wherein the step 2) specifically comprises:
2-1) assume that there are N input samples { x in dataset DnN is 1,2, …, N, each sample has M attributes, each sample belongs to one of C classes, and the number of samples in the C class is Nc,c=1,2,…,C;
2-2) dividing the samples in the data set according to the categories to obtain a data subset D consisting of the samples belonging to the class ccC is 1,2, …, C; initializing, and making the current class index c equal to 0 and the current hidden layer neuron number P equal to 0;
2-3) let c ═ c + 1;
2-4) let P ═ P +1, calculate DcAnd the Hausdorff distance h between other samplesPThe corresponding sample is used as a hidden layer neuron center k newly added in the class cP(ii) a Calculating DcAll samples in to kPThe recording distance is less than hPCorresponding subset d of all samplescAnd d iscFrom DcDeleting; with dcNumber of intermediate samples as kPConnection weight w between output neuron and corresponding classP,kPThe connection weight value between the neuron and other output neurons is 0; calculating dcThe variance v of each dimension attribute inmComposition kPCorresponding diagonal distribution matrix
Figure FDA0003396351750000021
2-5) if DcIf the number of the remaining samples is not 0, returning to the step c; otherwise, it is checked whether C is equal to C,if C < C, go back to step 2-3), if C ═ C, the algorithm terminates.
4. The unbalanced data and image processing method of claim 3, wherein the step 3) specifically comprises:
3-1) determining that an input layer of the RBF neural network data generation model has M input neurons according to M attributes of each sample in the data set D, wherein each neuron corresponds to one attribute;
3-2) determining that an output layer of the RBF neural network data generation model has C output neurons according to C categories of the data set D, wherein each neuron corresponds to one category;
3-3) obtaining P hidden layer neurons k according to the result of the step 2)1,k2,…,kP-1,kPAnd its corresponding class and output weight { w }1,w2,…,wP-1,wPAnd the corresponding P diagonal distribution matrices { V }1,V2,…,VP-1,VPDetermining parameters of P hidden layer neurons { (k)1,V1),(k2,V3),…,(kP-1,VP-1),(kP,VP) And the connection weight between each hidden layer neuron and the corresponding output neuron { w }1,w2,…,wP-1,wP}。
5. The unbalanced data and image processing method of claim 4, wherein the step 4) specifically comprises:
4-1) setting the number S of samples to be generated for each categorycC is 1,2, …, C; initializing, making the current hidden layer neuron center index p equal to 0, and generating a sample set
Figure FDA0003396351750000022
Figure FDA0003396351750000023
Representing an empty set;
4-2) let p ═ p +1, assuming current hidden neuron center kPBelongs to class c, then kPCorresponding to the number of generated samples of
Figure FDA0003396351750000031
4-3) generated sample matrix
Figure FDA0003396351750000032
Wherein each sample belongs to class c; will be provided with
Figure FDA0003396351750000033
Are combined into the generated set of samples S,
Figure FDA0003396351750000034
checking whether P is equal to P, and returning to the step 4-2) if P < P); if P is equal to P, obtaining a complete generated sample set S, and executing the next step;
4-4) mean vector L from all attributes saved during preprocessingmeanAnd standard deviation LstdCarrying out inverse standardization on S; and converting the corresponding numerical value in the S back to the original values of the ordinal attribute and the nominal attribute according to the sequential coding mode of the ordinal attribute and the nominal attribute.
6. An unbalanced data and image processing system for processing unbalanced data using a method as claimed in any one of claims 1 to 5, the system comprising:
the data preprocessing module is used for preprocessing the unbalanced data set O according to the method in the step 1) to obtain a data set D;
the maximum distribution algorithm module is used for determining parameters of the RBF neural network data generation model to be constructed according to the method in the step 2);
the network model building module is used for building an RBF neural network data generation model according to the method in the step 3);
the RBF neural network data generation model is combined with the mvnrnd function, and a new data set S is generated in a self-adaptive mode according to the distribution of the original unbalanced data set by the method in the step 4);
and a data post-processing module for filling the generated sample set S into the original unbalanced data set O to obtain a processed balanced data set Os
7. A storage medium on which a computer program is stored, characterized in that the program is adapted to carry out the method of any one of claims 1-5 when executed.
8. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1-5 when executing the computer program.
CN202111485510.9A 2021-12-07 Unbalanced data and image processing method, system and computer equipment Active CN114254698B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111485510.9A CN114254698B (en) 2021-12-07 Unbalanced data and image processing method, system and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111485510.9A CN114254698B (en) 2021-12-07 Unbalanced data and image processing method, system and computer equipment

Publications (2)

Publication Number Publication Date
CN114254698A true CN114254698A (en) 2022-03-29
CN114254698B CN114254698B (en) 2024-10-22

Family

ID=

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019041629A1 (en) * 2017-08-30 2019-03-07 哈尔滨工业大学深圳研究生院 Method for classifying high-dimensional imbalanced data based on svm
CN109993229A (en) * 2019-04-02 2019-07-09 广东石油化工学院 A kind of serious unbalanced data classification method
KR20200027834A (en) * 2018-09-05 2020-03-13 성균관대학교산학협력단 Methods and apparatuses for processing data based on representation model for unbalanced data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019041629A1 (en) * 2017-08-30 2019-03-07 哈尔滨工业大学深圳研究生院 Method for classifying high-dimensional imbalanced data based on svm
KR20200027834A (en) * 2018-09-05 2020-03-13 성균관대학교산학협력단 Methods and apparatuses for processing data based on representation model for unbalanced data
CN109993229A (en) * 2019-04-02 2019-07-09 广东石油化工学院 A kind of serious unbalanced data classification method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李金鑫: "基于多示例多标签径向基神经网络的网页分类方法", 中国优秀硕士学位论文全文数据库信息科技辑 (月刊), no. 07, 15 July 2018 (2018-07-15), pages 1 - 74 *

Similar Documents

Publication Publication Date Title
CN113159147B (en) Image recognition method and device based on neural network and electronic equipment
CN111260462B (en) Transaction fraud detection method based on heterogeneous relation network attention mechanism
CN111352965B (en) Training method of sequence mining model, and processing method and equipment of sequence data
Zhang et al. Interpreting neural network judgments via minimal, stable, and symbolic corrections
CN112613552B (en) Convolutional neural network emotion image classification method combined with emotion type attention loss
CN110532880B (en) Sample screening and expression recognition method, neural network, device and storage medium
CN110210625A (en) Modeling method, device, computer equipment and storage medium based on transfer learning
Pengfei et al. A new sampling approach for classification of imbalanced data sets with high density
US20230401466A1 (en) Method for temporal knowledge graph reasoning based on distributed attention
Zhang Deep generative model for multi-class imbalanced learning
CN111415167B (en) Network fraud transaction detection method and device, computer storage medium and terminal
CN116452333A (en) Construction method of abnormal transaction detection model, abnormal transaction detection method and device
CN115330435A (en) Method, device, equipment and medium for establishing carbon emission right price index system
CN108647714A (en) Acquisition methods, terminal device and the medium of negative label weight
Gavval et al. CUDA-Self-Organizing feature map based visual sentiment analysis of bank customer complaints for Analytical CRM
Sahbi A particular Gaussian mixture model for clustering and its application to image retrieval
CN112541530B (en) Data preprocessing method and device for clustering model
CN114254698B (en) Unbalanced data and image processing method, system and computer equipment
CN114254698A (en) Unbalanced data and image processing method and system and computer equipment
Benchaji et al. Novel learning strategy based on genetic programming for credit card fraud detection in Big Data
CN116188174A (en) Insurance fraud detection method and system based on modularity and mutual information
CN115688923A (en) Data processing method and system for coping with internet financial security
CN111291838B (en) Method and device for interpreting entity object classification result
CN113240425A (en) Financial anti-money laundering transaction method, device and storage medium based on deep learning
CN110033862A (en) A kind of Chinese medicine Quantitative Diagnosis system and storage medium based on weighted digraph

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant