CN109033976B - Abnormal muscle detection method and system - Google Patents

Abnormal muscle detection method and system Download PDF

Info

Publication number
CN109033976B
CN109033976B CN201810682299.1A CN201810682299A CN109033976B CN 109033976 B CN109033976 B CN 109033976B CN 201810682299 A CN201810682299 A CN 201810682299A CN 109033976 B CN109033976 B CN 109033976B
Authority
CN
China
Prior art keywords
sample
minority
samples
new
sample set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810682299.1A
Other languages
Chinese (zh)
Other versions
CN109033976A (en
Inventor
王念
崔莉
赵泽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhongke Tianhe Technology Co ltd
Original Assignee
Beijing Zhongke Tianhe Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhongke Tianhe Technology Co ltd filed Critical Beijing Zhongke Tianhe Technology Co ltd
Priority to CN201810682299.1A priority Critical patent/CN109033976B/en
Publication of CN109033976A publication Critical patent/CN109033976A/en
Application granted granted Critical
Publication of CN109033976B publication Critical patent/CN109033976B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/12Classification; Matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Signal Processing (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
  • Measurement And Recording Of Electrical Phenomena And Electrical Characteristics Of The Living Body (AREA)

Abstract

The application discloses an oversampling processing method. The oversampling processing method includes: acquiring a minority sample set; dividing the minority sample set into a minority boundary region and a minority positive region according to a neighborhood rough set algorithm; and interpolating between the minority class boundary region and the minority class positive region to generate a synthesized sample. The method divides a minority sample set based on a neighborhood rough set, randomly selects minority boundary area samples to face a minority normal area to perform directional interpolation, generates a synthetic sample with a normal area mode, increases the number of the minority samples and balances a data set, and has no problem of insufficient oversampling because the oversampling method does not have correctness verification of an oversampling point. The method and the device solve the problem that the existing oversampling method cannot adequately oversample the synthesized sample with the positive domain mode in the process of synthesizing the oversampled data.

Description

Abnormal muscle detection method and system
Technical Field
The application belongs to the field of machine learning and data preprocessing, and relates to an oversampling data balance processing method in abnormal muscle detection.
Background
When the electromyographic signals are used for detecting abnormal muscles, a large number of electromyographic signals with labels (namely, abnormal muscles and normal muscles) are required to be collected, and a detection model of the abnormal muscles is trained according to the data set so as to achieve the purpose of identifying the abnormal muscles. The inventor hopes that the abnormal muscle identification model can achieve the best generalization effect on a future real data set, but the generalization capability of the model is closely related to the data set, and due to the unbalanced distribution of patients and normal persons in the real world, the acquired abnormal muscle electromyography data set often has a skewness problem, namely the number of different types of samples in the acquired data set has a large difference, and the condition is called data imbalance. The inventor proposes the scheme to solve the problem of reduced generalization capability of the abnormal muscle detection classification model caused by data imbalance.
The classification and prediction algorithm of the traditional machine learning assumes that the number of various samples in a sample set has no obvious difference, so that when the traditional machine learning method is applied to an unbalanced data set, in order to maximize the overall accuracy, a classifier usually pays more attention to most samples and neglects few samples, thereby causing the phenomena that the classification space of most samples is expanded and the classification boundary is biased to most samples. Thereby making a few classes of samples difficult to identify and affecting the performance of the classifier.
To solve the imbalance problem from the data set level, an oversampling method has been proposed. The oversampling method balances the amount of data in the minority sample set and the majority sample set by processing the data in the minority sample set, and the common oversampling method balances the data set by copying the minority sample or generating a new minority sample (synthesized sample). In the oversampling method, the simplest method is to directly copy the minority class samples, but this method only causes the change of the number of the minority class samples, does not cause qualitative change, and cannot really improve the classification property of the minority class samples. In the oversampling method in the prior art, the oversampled synthesized samples have randomness, their distribution is too dispersed, the typical pattern of the samples is lacking, which may affect the generalization capability of the subsequent classifier, and the problem of insufficient sampling rate may occur.
The inventor proposes a solution to the problems of lack of typical samples and insufficient sampling rate in the prior art over-sampling method.
Content of application
The main objective of the present application is to provide an abnormal muscle detection method, so as to solve the problem that the existing oversampling method cannot adequately oversample a synthesized sample having a positive domain mode in the process of synthesizing oversampled data.
In order to achieve the above object, according to one aspect of the present application, there is provided an abnormal muscle detecting method including: acquiring a first minority sample set; dividing the first minority sample set into a minority boundary region and a minority positive region according to a neighborhood rough set algorithm; and interpolating between the minority class boundary region and the minority class positive region to generate a synthesized sample.
Further, the abnormal muscle detection method further includes: generating a second minority sample set according to the first minority sample set and the synthesized sample; determining whether the number of samples of the second minority sample set corresponds to the number of samples of a majority sample set; if not, continuously interpolating between the minority class boundary region and the minority class positive region to generate a synthetic sample, and adding the synthetic sample into the second minority class sample set until the number of samples in the second minority class sample set corresponds to the number of samples in the majority class sample set.
Further, the interpolating between the minority class boundary region and the minority class positive region to generate a synthesized sample includes: randomly selecting at least one first sample in the minority boundary region; randomly selecting at least one second sample in the minority positive field; and randomly interpolating in a region formed by the first sample and the second sample to generate at least one synthesized sample.
Further, when the number of samples of the second minority sample set corresponds to the number of samples of the majority sample set: generating a balanced data set according to the majority type sample set and the second minority type sample set; and outputting the balance data set.
Further, the interpolating between the minority class boundary region and the minority class positive region to generate a synthesized sample includes: randomly selecting a first sample in the minority boundary region; randomly selecting a second sample from the minority positive field; and randomly interpolating on a connecting line between the first sample and the second sample to generate at least one synthesized sample.
Further, the interpolating between the minority class boundary region and the minority class positive region to generate a synthesized sample includes: randomly selecting a first sample in the few boundary regions; randomly selecting two second samples from the minority positive field; and randomly interpolating in a triangular area formed by the first sample and the two second samples to generate at least one synthesized sample.
Further, the determining whether the number of samples of the second minority sample set corresponds to the number of samples of the majority sample set includes: and judging whether the number of samples of the second minority sample set is equal to that of the majority sample set.
In order to achieve the above object, according to another aspect of the present application, there is provided an abnormal muscle detection system including: a sample set acquisition unit configured to acquire a first minority class sample set; a sample set partitioning unit configured to partition the first minority sample set into a minority class boundary region and a minority class positive region according to a neighborhood rough set algorithm; an oversampling unit configured to interpolate between the minority class boundary region and the minority class positive region, generating a synthesized sample.
Further, the abnormal muscle detection system further comprises: a sample synthesis unit configured to generate a second minority sample set from the first minority sample set and the synthesis sample; a balance verification unit configured to determine whether a number of samples of the second minority class sample set corresponds to a number of samples of a majority class sample set.
Further, the oversampling unit includes: a boundary region sampling unit configured to randomly select at least one first sample in the minority class boundary region; a positive domain sampling unit configured to randomly choose at least one second sample in the minority class of positive domains; an oversampling interpolation unit configured to randomly interpolate within a region composed of the first sample and the second sample, generating at least one synthesized sample.
In the embodiment of the application, a neighborhood rough set is adopted to divide a minority sample set, the minority sample set is divided into a minority positive domain and a minority boundary region, and minority boundary region samples are randomly selected to perform directional interpolation towards the minority positive domain, so that a synthesized sample with a more positive domain mode is generated, the number of the minority samples is increased, and a data set is balanced. The method solves the problems that the existing over-sampling method cannot fully over-sample the synthesized sample with the positive domain mode in the process of synthesizing the over-sampled data and has insufficient sampling rate.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, are included to provide a further understanding of the application and to enable other features, objects, and advantages of the application to be more apparent. The drawings and their description illustrate the embodiments of the invention and do not limit it. In the drawings:
FIG. 1 is a schematic flow chart of a first embodiment abnormal muscle detection method;
FIG. 2 is a schematic flow chart of a synthetic sample generation method according to a first embodiment;
FIG. 3 is a schematic diagram of an oversampling region and a synthesized sample for three-sample random interpolation according to an embodiment;
FIG. 4 is a flowchart illustrating an abnormal muscle detection method according to a second embodiment;
FIG. 5 is a flowchart illustrating a method for detecting abnormal muscles according to a third embodiment;
FIG. 6 is a schematic diagram of the structure of an abnormal muscle detection system of an embodiment;
FIG. 7 is a schematic structural diagram of an oversampling unit of an embodiment;
FIG. 8 is an example unbalanced myoelectricity data table.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the accompanying drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be used. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
As shown in fig. 1, the abnormal muscle detecting method includes steps S101 to S107.
Step S101, a first few-class sample set is obtained. In the steps, firstly, a data acquisition device is used for acquiring human body electromyographic signals, an original unbalanced data set is obtained through manual marking, the data set comprises two types of electromyographic samples, namely a small number of abnormal samples and a large number of normal samples, a small number of types of samples and a large number of types of samples in the original unbalanced data set are formed, and then the small number of types of sample sets corresponding to the abnormal samples are obtained from the original unbalanced data set.
Step S102, dividing the first minority sample set into a minority boundary region and a minority positive region according to a neighborhood rough set algorithm. In an optional embodiment of the present application, the obtained original unbalanced data set is subjected to data partitioning based on a neighborhood rough set, and a minority sample in the original unbalanced data set is divided into a minority positive domain and a minority boundary region; the majority type samples in the original unbalanced data set are divided into a majority type positive region and a majority type boundary region, thereby dividing the original unbalanced data set into four portions.
And step S103, interpolating between the minority class boundary region and the minority class positive region to generate a synthesized sample. In the step, a plurality of synthetic samples are generated by interpolating towards the positive few-class domain based on the samples of the boundary region of the few classes, and each synthetic sample is generated by clearly synthesizing towards the positive few-class domain based on the mode of the samples of the boundary region, has the modes of the boundary region and the positive domain, is generated at random positions, and can be positioned in the boundary region or the positive domain interval, so that the influence of synthetic data on the classification interface of the classifier established on the data set can be reduced.
Step S104, generating a second minority sample set according to the first minority sample set and the synthesized sample. In this step, the generated synthetic sample is added to the first minority class sample set, and a second minority class sample set including the synthetic sample is generated.
In the present application, when the problem of data imbalance is solved by the oversampling method, it is important to ensure that the number of synthesized samples generated by oversampling reaches a certain number, so that the number of the minority class samples and the number of the majority class samples reach a balanced state. The application also includes a step of verifying whether the number of synthesized samples is sufficient in consideration of ensuring the number of synthesized samples.
As shown in fig. 1, the present application further includes a step S105 of verifying whether the number of synthesized samples is sufficient. Step S105, determining whether the number of samples in the second minority sample set corresponds to the number of samples in the majority sample set. In this step, it is determined whether the sum of the numbers of the minority samples and the synthesized samples can correspond to the number of the majority samples, so as to balance the number of the samples. Here, the correspondence may mean that the sum of the numbers of the minority samples and the synthesized samples is equal to the number of the majority samples, or that the absolute value of the difference between the sum of the numbers of the minority samples and the synthesized samples and the number of the majority samples is within a certain range. In addition, in practical application, when the number of the minority samples and the number of the majority samples can reach more than 80 percent, the samples can be approximately considered to be balanced.
In a preferred embodiment, step S105 is to determine whether the sum of the minority samples and the composite samples is equal to the majority samples. If the sum of the minority class samples and the composite samples is judged not to reach the number of the majority class samples, returning to the step S103, and continuing to generate new composite samples by interpolating between the minority class boundary region and the minority class positive region until the sum of the minority class samples and the composite samples equals to the number of the majority class samples.
And step S106, generating a balance data set according to the majority type sample set and the second minority type sample set. When it is determined that the sum of the minority samples and the composite samples is equal to the number of the majority samples through step S105, it may be determined that the samples are balanced in the sample set composed of the minority samples, the composite samples, and the majority samples, that is, the sample set is a balanced data sample.
And step S107, outputting the balance data set. In this step, the balanced sample set obtained in step S106 is output and further processed. In the application, the abnormal muscle detection model can be trained by machine learning according to the balance sample set, and because the number of the normal samples and the number of the abnormal samples (namely, the majority samples and the minority samples) in the balance sample set are consistent, the abnormal muscle detection model trained by machine learning has good generalization capability.
As shown in fig. 2, the step S103 performs interpolation between the minority class boundary region and the minority class normal region to generate a synthesized sample, and specifically includes steps S201 to S203.
Step S201, randomly selecting a first sample in the minority class boundary region. In this step, at least one minority sample is randomly selected from the minority boundary region during interpolation.
In step S202, at least one second sample is randomly selected from the minority positive fields. In this step, at least one majority-class sample is randomly selected from the minority-class positive field.
Step S203, randomly interpolating in the area formed by the first sample and the second sample to generate at least one synthesized sample. In the embodiment of the present application, the closed region formed by connecting the few samples and the majority samples selected in the above steps S201 and S202 end to end may be randomly selected from a plurality of points in the closed region, so as to generate the synthesized sample. In the application, since at least one of the minority sample and the majority sample is selected, the shape of the formed closed area is not unique, for example, when only one minority sample and one majority sample are selected, the two samples are connected to form a straight line, and then the synthetic sample is randomly selected on the straight line; when one few sample and two most samples are selected, the three sample data are sequentially connected to form a triangular area, and then a synthesized sample is randomly selected in the triangular area.
Fig. 4 is a schematic flowchart of an abnormal muscle detection method according to an alternative embodiment of the present application, and as shown in fig. 4, the abnormal muscle detection method according to this embodiment includes steps S301 to S307.
Step S301, acquiring a minority class sample set XMinIn this step, an abnormal muscle detection data set X forming an imbalance is collected and manually labeled using a data collection device and input, X ═ X1,x2,…,xmX or X ═ XMaj,XMinIn which X isMajFor a majority sample set, XMinA collection of samples of a few classes.
Step S302, in the minority class sample set XMinPerforming neighborhood rough set division, and collecting a few types of samples XMinPositive fields Pos divided into minority classesMinAnd a few boundary region BoundMin. In another alternative, X may be given for the unbalanced data set X ═ XMaj,XMinDividing a neighborhood rough set into a plurality of classes of positive domains PosMajClass minority positive domains PosMinMost boundary region BoundMajAnd a few boundary region BoundMinSo as to obtain minority class positive domain PosMinAnd a few boundary region BoundMin
Step S303, Bound in the boundary region of minority classMinSelect any sample, denoted as xiAnd are combinedIn the minority class of positive domains PosMinRandom sample is selected and marked as xj
Step S304, firstly, a random number xi, xi e (0,1) is generated and is set in xiAnd xjInserting a synthesized sample at the xi position between the two samples to generate the formula of xnew=xi+ξ×(xj-xi) X is to benewPut into the synthetic sample data set Xnew.
Step S305, the generated synthetic sample set XnewAdd few sample sets XminGenerating a set { Xmin,Xnew}。
Step S306, judging the set { X }min,XnewWhether the number of samples in the same sample set as the majority class sample set XMajThe number of samples in the synthesis sample group is equal, and if the number of samples in the synthesis sample group is not equal, the process returns to step S304, and the synthesis sample continues to be obtained until the number of samples is equal.
Step S307, when the set { X }min,XnewWhether the number of samples in the } is equal to the majority class sample set XMajWhen the number of samples in (1) is equal, the balanced data set { X is outputMaj,XMin,Xnew}。
The abnormal muscle detection method of the embodiment of fig. 4 will be further described with reference to the data in the unbalanced myoelectric data table of the embodiment of fig. 8. Fig. 8 is a data set composed of electromyographic data acquired by the data acquisition device, which includes 50 samples in total: x ═ X1,x2,…,x50}. The column D1 is the normalized myoelectricity RMS value characteristic which is a first dimension characteristic; column D2 shows the normalized myoelectric RMS value characteristic, which is a second dimension characteristic. The label column represents whether the current sample is an abnormal muscle, 0 represents that the current sample is not an abnormal muscle, and 1 represents that the current sample is an abnormal muscle. The data set is in an unbalanced state with an abnormal muscle sample size of 12, a normal muscle sample size of 28, a normal muscle class in the majority class, and an abnormal muscle class in the minority class. The embodiment expects that by the method of the present application, a random position in the connecting line of a certain sample in the minority class boundary region and a certain sample in the minority class positive regionSynthetic samples are inserted that have the characteristics of the present class of exemplars, adding a few classes of exemplars to balance the data set.
Step S301, collecting and labeling an electromyographic data set X ═ X formed by a patient and a normal person sample by using a data collecting device1,x2,…,x50FIG. 8, in which the majority sample subset is XMaj={x1,x2,…,x28A subset of minority class samples is XMin={x29,x30,…,x50}。
Step S302, a neighborhood rough set is established on a data set to obtain a plurality of types of positive domain sample sets PosMaj= {x1,x2,…,x14H, minority class positive domain sample set PosMin={x40,x41,x42,x43,x45,x46,…,x49H, most boundary sample sets BoundMaj={x15,x16,…,x28}, boundary sample set Bound of minority classMin={x29,x30,…,x39,x44,x50}。
Step S303, selecting random sample x in few boundary regions31And selects a random sample x in the positive field43
Step S304, a random value ξ of 0.5 at x is generated31And x43Interpolating synthetic sample xnew(0.5,0.81) and put it in the new sample data set Xnew={(0.5,0.81)}。
Step S305, generating a set { X }Min,Xnew}。
Step S306, judging the set { X }Min,XnewX and most sample-like subsetMajThe samples are not yet balanced and oversampling continues.
Selecting random samples x in few class boundary regions39And selects a random sample x in the positive field42Generate a random value ξ ═ 0.55 at x39And x42Insert synthetic sample x in betweennew(0.57,0.42), and willIt puts in a new sample data set Xnew{ (0.5,0.81), (0.57,0.42) } judgment: set { XMin,XnewAnd most sample subsets are XMajIf the samples are not yet balanced, oversampling is continued.
Selecting random samples x in few class boundary regions39And selecting a random sample x in the positive field43Generate a random value ξ ═ 0.3 at x39And x43Insert synthetic sample x in betweennew(0.57,0.58) and put it in the new sample data set Xnew{ (0.5,0.81), (0.57,0.42), (0.57,0.58) } judgment: set { XMin,XnewAnd most sample subsets are XMajIf the samples are not yet balanced, oversampling is continued.
Selecting random samples x in few class boundary regions38And selecting a random sample x in the positive field42Generate a random value ξ ═ 0.6 at x38And x42Insert synthetic sample x in betweennew(0.62,0.27) and put it in the new sample data set Xnew{ (0.5,0.81), (0.57,0.42), (0.57,0.58), (0.62,0.27) } judgment: set { XMin,XnewAnd most sample subsets are XMajIf the samples are not balanced, oversampling is continued until the samples are balanced.
Step S307, outputting: equilibrium dataset with added synthetic samples { XMaj,XMin,Xnew}。
Fig. 5 is a schematic flowchart of an abnormal muscle detection method according to another alternative embodiment of the present application, and as shown in fig. 5, the abnormal muscle detection method according to this embodiment includes steps S401 to S407.
Step S401, acquiring a minority sample set XMinIn the present step, an abnormal muscle detection data set X forming an imbalance is collected and manually labeled using a data collection apparatus and input, X ═ X { X ═ X }1,x2,…,xmX or X ═ XMajXMin }, wherein XMajFor a majority sample set, XMinA collection of samples of a few classes.
Step S402, sample set in minority classXMinPerforming neighborhood rough set division, and collecting a few types of samples XMinPositive fields Pos divided into minority classesMinAnd a few boundary region BoundMin. In another alternative, X may be given for the unbalanced data set X ═ XMaj,XMinDividing a neighborhood rough set into a plurality of classes of positive domains PosMajClass minority positive domains PosMinMost boundary region BoundMajAnd a few boundary region BoundMinSo as to obtain minority class positive domain PosMinAnd a few boundary region BoundMin
Step S403, Bound in the boundary region of minority classMinSelect any sample, denoted as xiAnd in minority class of positive domains PosMinTwo random samples are selected, denoted xjAnd xk
Step S404, at xi,xjAnd xkInserting a synthesized sample x at random position in the composed triangular regionnewThen x is addednewPut into the synthetic sample data set Xnew.
Step S405, the generated synthetic sample set XnewAdd few classes sample set XMinGenerating a set { XMin,Xnew}。
Step S406, determine the set { X }Min,XnewWhether the number of samples in the same sample set as the majority class sample set XMajThe number of samples in the synthesis sample group is equal, and if the number of samples in the synthesis sample group is not equal, the process returns to step S304, and the synthesis sample continues to be obtained until the number of samples is equal.
Step S407, when the set { X }Min,XnewWhether the number of samples in the same sample set as the majority class sample set XMajWhen the number of samples in (1) is equal, the balanced data set { X is outputMaj,XMin,Xnew}。
The abnormal muscle detection method of the embodiment of fig. 5 will be further described with reference to the data in the unbalanced myoelectricity data table of the embodiment of fig. 8. As in the embodiment of fig. 4, the table of fig. 8 is a data set composed of electromyographic data acquired by the data acquisition device, and includes 50 samples:X={x1,x2,…,x50}. Column D1 is the normalized myoelectricity RMS value, which is the first dimension characteristic; column D2 is the normalized myoelectric RMS value, which is the second dimension feature. The label column is a data label column, and represents whether the sample is an abnormal muscle, 0 represents that the current sample is not an abnormal muscle, and 1 represents that the current sample is an abnormal muscle. The data set is in an unbalanced state with an abnormal muscle sample size of 12, a normal muscle sample size of 28, a normal muscle class belonging to the majority class, and an abnormal muscle class belonging to the minority class. This example is expected to add abnormal muscle samples by the method of the present invention by generating synthetic samples at random positions within a triangle composed of a random sample and two positive domain samples, thereby balancing the data set and making the synthetic samples have the characteristics of the present type of representative samples.
Step S401, collecting and labeling an electromyographic data set X ═ X formed by a patient and a normal person sample1,x2,…,x50FIG. 8, in which the majority sample subset is XMaj={x1,x2,…,x28A subset of minority class samples is XMin={x29,x30,…,x50}。
Step S402, in the minority class sample set XMinPerforming neighborhood rough set division, and collecting a few types of samples XMinPositive fields Pos divided into minority classesMinAnd a minority boundary region BoundMin. In another alternative, the unbalanced data set X may be set { X ═ XMaj,XMinDividing a neighborhood rough set into a plurality of classes of positive domains PosMajClass ii positive domain PosMinMost boundary region BoundMajAnd few boundary region types BoudMinSo as to obtain minority class positive domain PosMinAnd a few boundary region BoundMin
Step S403, selecting random sample x in few boundary regions31And selects a random sample x in the positive field43And x45
Step S404, at x31,x43And x45Built-up triangular middle insertInto a synthesis sample xnew(0.64,0.59) and put it into the new sample data set Xnew={(0.64,0.59)}。
Step S405, the generated synthetic sample set XnewAdd few classes sample set XMinGenerating a set { XMin,Xnew}。
Step S406, determine the set { XMin,XnewAnd most sample subsets are XMajThe samples are not yet balanced and oversampling continues.
Selecting random samples x in few class boundary regions39And selects a random sample x in the positive field41And x42At x39,x41And x42Inserting synthetic sample x in composed trianglenew(0.63 ) and put it in the new sample data set Xnew{ (0.64,0.59), (0.63 ) }. And (3) judging: set { XMin,XnewAnd most sample subsets are XMajThe samples are not yet balanced and oversampling continues.
Selecting random samples x in few class boundary regions39And selecting a random sample x in the positive field43And x49At x39,x49And x43Composition triangle inner insert synthesis sample xnew(0.82,0.29) and put it in the new sample data set Xnew{ (0.64,0.59), (0.63 ), (0.82,0.29) }. And (3) judging: set { XMin,XnewAnd most sample subsets are XMajThe samples are not yet balanced and oversampling continues.
Selecting random samples x in few class boundary regions38And selecting a random sample x in the positive field42And x46At x38,x46And x42Composition triangle inner insert synthesis sample xnew(0.76,0.19) and put it in the new sample data set Xnew{ (0.64,0.59), (0.63 ), (0.82,0.29), (0.76,0.19) }. And (3) judging: set { XMin,XnewAnd most sample subsets are XMajThe samples are not yet balanced and oversampling is continued until the samples reach equilibrium.
Step S407, output: equilibrium dataset with added synthetic samples { XMaj,XMin,Xnew}。
Fig. 3 shows a schematic diagram of an oversampling region and a synthesized sample of a three-sample random interpolation of the method for generating a synthesized sample in the embodiment of fig. 5, as shown in fig. 3, by performing random interpolation in a triangular region formed by three samples, the generated synthesized sample may be in a few types of boundary regions or a few types of positive regions, which does not cause a shift of a subsequent machine learning model classification interface, so that the synthesized sample is more easily classified and is also helpful for determining the subsequent model classification interface.
In this application, the embodiments of fig. 4 and 5 are two possible embodiments, and in practical application systems, the oversampling method between the boundary region and the forward region samples is not limited to the two forms mentioned in the embodiments, but may be a plurality of different methods as long as the generated synthesized samples are generated by interpolating toward the forward region direction explicitly based on the boundary region.
From the above description, it can be seen that the present application achieves at least the following technical effects:
1. a neighborhood rough set is adopted to divide a minority sample set, the minority sample set is divided into a minority positive domain and a minority boundary region, and minority boundary region samples are randomly selected to face the minority positive domain for directional interpolation, so that a synthetic sample generated by oversampling has a positive domain mode.
2. The method and the device adopt random selection of few types of boundary region samples to face few types of positive regions for directional interpolation, the generated synthetic samples can be possibly positioned in the boundary region or the positive region, the deviation of a subsequent machine learning model classification interface can not be caused, the synthetic samples are more easily classified, and the determination of the subsequent model classification interface is facilitated.
3. The oversampling method of the application has no correctness verification of the oversampling point, and the result of oversampling every time can be retained, so that the problems of insufficient oversampling rate and insufficient oversampling are not generated.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.
According to an embodiment of the present application, there is also provided an oversampling system for implementing the oversampling method described above, as shown in fig. 6, the system including: a sample set acquisition unit 1, a sample set partitioning unit 2, an oversampling unit 3, a sample synthesis unit 4, a balance verification unit 5, and a balance set output unit 6, wherein:
the system comprises a sample set acquisition unit 1, a data processing unit and a data processing unit, wherein the sample set acquisition unit is used for acquiring a minority sample set from an original unbalanced data set obtained by manual labeling;
the sample set dividing unit 2 is used for dividing the minority sample set into a minority boundary region and a minority positive region according to a neighborhood rough set algorithm;
an oversampling unit 3, configured to perform interpolation between the minority class boundary region and the minority class positive region, and generate a synthesized sample;
a sample synthesis unit 4, configured to add the synthesized sample to the minority sample set, and generate a second minority sample set including the synthesized sample;
a balance verifying unit 5, configured to determine whether the number of samples in the second minority sample set is equal to the number of samples in the majority sample set;
and the balanced set output unit 6 is used for generating a balanced data set and outputting the balanced data set.
As shown in fig. 7, the oversampling unit 3 further includes: a boundary region sampling unit 301, a positive region sampling unit 302, and an oversampling interpolation unit 303, wherein:
a boundary region sampling unit 301, configured to randomly select at least one sample in the minority class boundary region;
a positive domain sampling unit 302, configured to randomly select at least one second sample in the minority positive domain;
an oversampling interpolation unit 303, configured to randomly interpolate within a region formed by the first sample and the second sample, and generate at least one synthesized sample.
It will be apparent to those skilled in the art that the modules or steps of the present application described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and they may alternatively be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, or fabricated separately as individual integrated circuit modules, or fabricated as a single integrated circuit module from multiple modules or steps. Thus, the present application is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (7)

1. An abnormal muscle detection method, comprising:
acquiring a first few-class electromyogram sample set XMinIn this step, an abnormal muscle detection data set X forming an imbalance is collected and manually labeled using a data collection device and input, X ═ X1,x2,…,xmX or X ═ XMaj,XMinIn which X isMajFor a majority sample set, XMinA minority sample set;
collecting the first few types of electromyography samples X according to a neighborhood rough set algorithmMinBoundary region Bound divided into few classesMinAnd minority-class positive domains PosMin
Bound in the minority boundary regionMinAnd the minority-class positive domain PosMinInterpolation is carried out between the two to generate a synthesized electromyography sample set Xnew
According to the first few electromyogram sample set XMinAnd the synthetic myoelectricity sample set XnewGenerating a second minority electromyography sample set { Xmin,Xnew};
Judging the second minority electromyography sample set { Xmin,XnewWhether the number of the myoelectricity samples is equal to the number of the myoelectricity samples in the majority of myoelectricity sample sets XMajThe number of the myoelectricity samples is corresponding to that of the myoelectricity samples;
if not, continuing to use the Bound few-class boundary region BoundMinAnd the minority-class positive domain PosMinInterpolation is carried out between the two to generate a synthesized electromyography sample set XnewAnd the synthesized myoelectricity sample set XnewAdding the second minority electromyography sample set { X)min,XnewUntil the second minority electromyography sample set { X }min,XnewThe number of myoelectricity samples and the majority of myoelectricity sample sets XMajThe number of the myoelectricity samples is corresponding to that of the myoelectricity samples;
if yes, collecting X according to the majority electromyographic samplesMajAnd the second minority electromyography sample set { Xmin,Xnew}, generating a balanced data set { XMaj,XMin,Xnew};
Outputting the balanced data set { XMaj,XMin,XnewAccording to said balanced data set { X }Maj,XMin,XnewAnd (5) training an abnormal muscle detection model by adopting machine learning.
2. The abnormal muscle detection method according to claim 1, wherein said bounding in said minority carrier boundary region is BoundMinAnd the minority-class positive domain PosMinInterpolation is carried out between the two to generate a synthesized electromyography sample set XnewThe method comprises the following steps:
bound in the minority boundary regionMinRandomly selecting at least one first electromyography sample;
in the minority-class positive domain PosMinRandomly selecting at least one second electromyography sample;
randomly interpolating in a region formed by the first electromyographic sample and the second electromyographic sample to generate at least one synthesized electromyographic sample set Xnew
3. The abnormal muscle detection method according to claim 1, wherein said bounding in said minority carrier boundary region is BoundMinAnd the minority-class positive domain PosMinInterpolation is carried out between the two to generate a synthesized electromyography sample set XnewThe method comprises the following steps:
bound in the minority boundary regionMinRandomly selecting a first electromyography sample;
in the minority-class positive domain PosMinRandomly selecting a second myoelectricity sample;
randomly interpolating on a connecting line between the first electromyographic sample and the second electromyographic sample to generate at least one synthesized electromyographic sample set Xnew
4. The abnormal muscle detection method according to claim 1, wherein said bounding in said minority carrier boundary region is BoundMinAnd the minority-class positive domain PosMinInterpolation is carried out between the two to generate a synthesized electromyography sample set XnewThe method comprises the following steps:
bound in the minority boundary regionMinRandomly selecting a first electromyography sample;
in the minority-class positive domain PosMinRandomly selecting two second myoelectric samples;
randomly interpolating in a triangular area formed by the first electromyographic sample and the two second electromyographic samples to generate at least one synthesized electromyographic sample set Xnew
5. The abnormal muscle detection method according to claim 1, wherein said determining the second minority-class electromyography sample set { X ™min,XnewWhether the number of the myoelectricity samples is equal to the number of the myoelectricity samples in the majority of myoelectricity sample sets XMajCorresponding to the number of myoelectric samples, comprising:
judging the second minority electromyography sample set { Xmin,XnewWhether the number of the myoelectricity samples is equal to the number of the myoelectricity samples in the majority of myoelectricity sample sets XMajThe number of myoelectric samples is equal.
6. An abnormal muscle detection system, comprising:
a myoelectric sample set acquisition unit configured to acquire a first few types of myoelectric sample sets XMin
A myoelectric sample set dividing unit configured to set the first few types of myoelectric samples X according to a neighborhood rough set algorithmMinBoundary region Bound divided into few classesMinAnd minority-class positive domains PosMin
An oversampling unit configured to be mounted in the minority class boundary regionMinAnd the minority-class positive domain PosMinInterpolation is carried out between the two to generate a synthesized electromyography sample set Xnew
A myoelectric sample synthesis unit configured to synthesize the first few types of myoelectric sample set X according to the first few types of myoelectric sample set XMinAnd the synthetic myoelectricity sample set XnewGenerating a second minority electromyography sample set { Xmin,Xnew};
A balance verification unit configured to determine the second minority-class electromyography sample set { Xmin,XnewWhether the number of the myoelectricity samples is equal to the number of the myoelectricity samples in the majority of myoelectricity sample sets XMajCorresponding to the number of myoelectric samples, comprising:
if yes, collecting X according to the majority electromyogram sample setMajAnd the second minority electromyography sample set { Xmin,Xnew}, generating a balanced data set { XMaj,XMin,Xnew};
Outputting the balanced data set { XMaj,XMin,XnewAccording to said balanced data set { X }Maj,XMin,XnewAnd (5) training an abnormal muscle detection model by adopting machine learning.
7. The abnormal muscle detection system according to claim 6, wherein the oversampling unit includes:
a boundary region sampling unit configured to sample at the minority class boundary region BoundMinRandomly selecting at least one first electromyography sample;
a positive domain sampling unit configured to sample the minority class of positive domains PosMinRandomly selecting at least one second electromyography sample;
an oversampling interpolation unit configured to randomly interpolate within a region of the first and second myoelectric samples to generate at least one synthetic myoelectric sample set Xnew
CN201810682299.1A 2018-06-27 2018-06-27 Abnormal muscle detection method and system Active CN109033976B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810682299.1A CN109033976B (en) 2018-06-27 2018-06-27 Abnormal muscle detection method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810682299.1A CN109033976B (en) 2018-06-27 2018-06-27 Abnormal muscle detection method and system

Publications (2)

Publication Number Publication Date
CN109033976A CN109033976A (en) 2018-12-18
CN109033976B true CN109033976B (en) 2022-05-20

Family

ID=65522009

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810682299.1A Active CN109033976B (en) 2018-06-27 2018-06-27 Abnormal muscle detection method and system

Country Status (1)

Country Link
CN (1) CN109033976B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111461855B (en) * 2019-01-18 2023-07-28 同济大学 Credit card fraud detection method and system based on undersampling, medium and equipment
CN112598118B (en) * 2021-03-03 2021-06-25 成都晓多科技有限公司 Method, device, storage medium and equipment for processing abnormal labeling in supervised learning

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103617411A (en) * 2013-10-17 2014-03-05 杭州电子科技大学 Myoelectricity signal identification method based on complexity, fractal dimension and fractal length
CN104766098A (en) * 2015-04-30 2015-07-08 哈尔滨工业大学 Construction method for classifier
CN107273798A (en) * 2017-05-11 2017-10-20 华南理工大学 A kind of gesture identification method based on surface electromyogram signal

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100763233B1 (en) * 2003-08-11 2007-10-04 삼성전자주식회사 Ppg signal detecting appratus of removed motion artifact and method thereof, and stress test appratus using thereof
US20110166436A1 (en) * 2010-01-04 2011-07-07 Edelman Robert R System and Method For Non-Contrast MR Angiography Using Steady-State Image Acquisition

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103617411A (en) * 2013-10-17 2014-03-05 杭州电子科技大学 Myoelectricity signal identification method based on complexity, fractal dimension and fractal length
CN104766098A (en) * 2015-04-30 2015-07-08 哈尔滨工业大学 Construction method for classifier
CN107273798A (en) * 2017-05-11 2017-10-20 华南理工大学 A kind of gesture identification method based on surface electromyogram signal

Also Published As

Publication number Publication date
CN109033976A (en) 2018-12-18

Similar Documents

Publication Publication Date Title
Holland et al. Using supernetworks to distinguish hybridization from lineage-sorting
McKenzie et al. Distributions of cherries for two models of trees
CN109033976B (en) Abnormal muscle detection method and system
FR2843213A1 (en) METHOD AND SYSTEM FOR AUTOMATICALLY ESTABLISHING A GLOBAL SIMULATION MODEL OF AN ARCHITECTURE
CN106446179B (en) The generation method and device of hot topic
Reno et al. From Lucy to Kadanuumuu: balanced analyses of Australopithecus afarensis assemblages confirm only moderate skeletal dimorphism
CN111967964A (en) Intelligent recommendation method and device for bank client website
CN106908747A (en) Chemical shift coded imaging method and device
CN108694413A (en) Adaptively sampled unbalanced data classification processing method, device, equipment and medium
Haining Describing and modeling rural settlement maps
CN106844743B (en) Emotion classification method and device for Uygur language text
CN109783381A (en) A kind of test data generating method, apparatus and system
CN112418305A (en) Training sample generation method and device, computer equipment and storage medium
Anderson Cranial muscle homology across modern gnathostomes
CN114697127B (en) Service session risk processing method based on cloud computing and server
CN114690038B (en) Motor fault identification method and system based on neural network and storage medium
CN109285009A (en) It brushes single recognition methods and brushes single identification device
Viruel et al. A bioinformatic pipeline to estimate ploidy level from target capture sequence data obtained from herbarium specimens
Cooper et al. Improving genetic algorithms’ efficiency using intelligent fitness functions
CN107784309A (en) A kind of realization method and system to vehicle cab recognition
CN114638430A (en) Mechanical examination affair arranging method and system
CN107193636A (en) Virtual task simulation method and device in sandbox environment under a kind of NUMA architecture
CN113837368A (en) Control method and device for evaluating data value of each participant in federal learning
CN106776892A (en) Based on music platform data assessment musical works network attention data method and system
CN110427405A (en) Data analysing method and Related product based on FE industry internet

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant