CN104657743A - Semi-supervised minimum and maximum modularization mode classification method - Google Patents
Semi-supervised minimum and maximum modularization mode classification method Download PDFInfo
- Publication number
- CN104657743A CN104657743A CN201510035805.4A CN201510035805A CN104657743A CN 104657743 A CN104657743 A CN 104657743A CN 201510035805 A CN201510035805 A CN 201510035805A CN 104657743 A CN104657743 A CN 104657743A
- Authority
- CN
- China
- Prior art keywords
- mrow
- msub
- sample
- samples
- msubsup
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 62
- 238000012549 training Methods 0.000 claims description 53
- 238000012360 testing method Methods 0.000 claims description 40
- 230000006870 function Effects 0.000 claims description 34
- 239000000243 solution Substances 0.000 claims description 11
- 238000007476 Maximum Likelihood Methods 0.000 claims description 8
- 238000013499 data model Methods 0.000 claims description 7
- 230000010354 integration Effects 0.000 claims description 7
- 239000011159 matrix material Substances 0.000 claims description 7
- 238000000354 decomposition reaction Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 5
- 239000007983 Tris buffer Substances 0.000 claims description 3
- 230000014509 gene expression Effects 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 3
- 238000007418 data mining Methods 0.000 abstract description 2
- 239000000463 material Substances 0.000 abstract description 2
- 230000001965 increasing effect Effects 0.000 description 3
- 230000002708 enhancing effect Effects 0.000 description 2
- 230000007786 learning performance Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000007499 fusion processing Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a semi-supervised minimum and maximum modularization mode classification method, and belongs to the technical field of data mining. According to the method, part of non-marked samples are added into a marked sample sub set obtained in a task decomposing stage of a minimum and maximum modularization network (namely an M3 network), and features containing non-marked sample information are generated according to the ideal of a generation type semi-supervised learning algorithm (namely fSSL) and are used as new features with marked samples, so that a semi-supervised M3 network is realized. According to the method, the problem that manpower and material resources are needed to be greatly consumed for marking a large scale of samples and the non-supervised learning is unstable are solved, and the learning property of the original M3 network is enhanced.
Description
Technical Field
The invention relates to a semi-supervised minimum and maximum modular pattern classification method, belonging to the technical field of data mining.
Background
In real life, the data volume of various industries is exponentially increased. According to the statistics of Internet Data Center (IDC), the global Data amount is increased at 40-60% every year, and is predicted to reach 35ZB (10) in 202021Bytes). The value of the data is self-evident, and how to effectively utilize the data has attracted attention from a number of researchers.
The 1999 Lubao grain professor in "task decomposition and module combination based on category relationship: a Modular Network for pattern classification is provided, which is based on the Min-Max Modular Network (M3 Network for short) and is used for solving the problem of difficult classification of large-scale complex data, the core is to adopt the idea of divide-and-conquer method to decompose large-scale data into a plurality of small and simple modules for processing so as to reduce the complexity of the original problem, and each sub-module is mutually connectedIndependent of each otherIn operation, no set-up is required between modulesAny communication, it facilitates the parallelization operation of the actual task. And finally, combining the prediction results of each module through a Min-Max rule to obtain the solution of the original problem.
The M3 network is currently only a supervised learning approach. Supervised Learning (Supervised Learning) means that the class to which a sample belongs is known, with the goal of being based on a given set of training samples Search for sample XlAnd mark YlAnd testing the quality of the mapping relation through a new test sample. Supervised learning requires that the class of all training samples must be known and that it requires a large number of labeled samples to achieve efficient generalization performance. However, in the practical problem, the marked sample and the unmarked sample are usually coexisted, and the marked sample is obtained by taking a lot of labor, even professional knowledge in a certain field is required as a support, while the unmarked sample is easily available. Unsupervised Learning (Unsupervised Learning) generally constructs different Learning models by using the internal relation between these unlabeled samples, and is different from supervised Learning essentially in that the class to which the sample belongs is unknown, and it cannot directly obtain the sample XlAnd mark YlThe mapping relationship between them. In view of the deficiencies of both, some researchers have proposed semi-supervised learning methods.
Generative Semi-Supervised Learning (fSSL) is one type of Semi-Supervised Learning. The generative semi-supervised learning is expressed as: training sample set S' ═ { X ═ X1′,X2′,...,XL′}, Where D represents the number of features of the original marked sample and K' represents the number of hidden variables. It is clear that the number of features used to describe the sample is increasing, but trainingThe number of training samples remains unchanged. The present invention can solve the above problems well.
Disclosure of Invention
The invention aims to solve the problems that large-scale sample marking needs to spend a large amount of manpower and material resources, learning instability exists in unsupervised learning, and an existing M3 network can only be used for supervised learning, and provides a semi-supervised minimum and maximum modular mode classification method, which comprises the following steps: (1) dividing a marked sample set according to a task division principle of an M3 network, dividing an unmarked sample set into the same number of blocks, and adding the unmarked sample subset into a marked sample subset; (2) closely connecting the marked sample with the unmarked sample by using the similarity matrix as a data model; (3) applying a Probability Latent Semantic Analysis (PLSA) model in the similarity matrix to obtain hidden variables between marked samples and unmarked samples; (4) the posterior probability of the marked sample and the hidden variable is used as a new feature of the marked sample, and the posterior probability of the test sample and the hidden variable is used as a new feature of the marked sample; (5) and integrating the results of the base classifiers by using a Min-Max rule to obtain the solution of the original two types of problems.
The technical scheme adopted by the invention for solving the technical problems is as follows: a semi-supervised minimum and maximum modular pattern classification method combines an M3 network of a semi-supervised learning idea and combines semi-supervised learning with an M3 network, thereby not only enhancing the learning performance of an original M3 network, but also effectively utilizing a large number of existing unlabelled samples.
The method comprises the following steps:
the method divides a marked sample set and an unmarked sample set into sample subsets according to a selected dividing method, and adds the unmarked sample subsets into the marked sample subsets without repetition according to a strategy of farthest distance of the central points of the subsets, thereby forming training subsets. And (3) solving hidden variables determining a data model generation process for generating marked samples and unmarked samples by utilizing a Probability Latent Semantic Analysis (PLSA) method for each training subset, and taking the hidden variables and the posterior probability of the marked samples as new features of the marked samples. The method mainly utilizes the marked samples added with the characteristics to train the classifier. And regarding the test sample, taking the posterior probability of the hidden variable and the test sample as a new feature, and using the classifier obtained in the training stage as a prediction label of the hidden variable and the test sample. And finally, integrating the results of the base classifiers by using a Min-Max rule to obtain the solution of the original two types of problems, wherein the method comprises the following specific steps:
step 1: and (4) dividing data.
The original marked sample set SLDividing the task into the parts according to the task division principle of the M3 network and the hyperplane division methodA subset of labelled samples, NiAnd NjRespectively generation by generationWatch S LMiddle CiClass and CjNumber of blocks into which class samples are divided. Equally dividing the unmarked sample set, wherein the number of the sample subsets is the same as that of the marked sample subsets;
step 2: assignment of unlabeled sample subset.
The center point of each sample subset is calculated and the unlabeled samples are assigned to the labeled sample subset that is farthest from its center point. At this time, eachIndependent of each otherThe training sample subset comprises two parts, one part is a marked sample subset of two types, and the other part is an unmarked sample subset;
and step 3: hidden feature generation.
To include a subset S of the marked sampleslabeledAnd unlabeled sample subset SunlabeledThe composed training subsets are examples. Suppose that both marked and unmarked samples are generated by a generative model and that some hidden variables z1, z2K″For deciding the wholeThe sample generation process comprises the following steps:
1) from all hidden variables with probability P (z)k) Select zk;
2) Given hidden variable zkWith a conditional probability P (lX)t|zk) Generating labeled samples lXt;
3) Given hidden variable zkWith conditional probability P (uX)r|zk) Generation of unlabeled sample uXr。
Marked sample lXtOnly with hidden variable zk(ii) related, independent of unlabeled sample; likewise, unlabeled sample uXrAlso only with the hidden variable zk(ii) related, independent of the labeled sample; marked sample lX generated by the modeltWith unlabeled sample uXrAre mutuallyIndependent of each otherOf (1); according to conditions in probability theoryIndependent of each otherThe following equation holds true:
P(lXt,uXr|zk)=P(lXt|zk)P(uXt|zk) Equation 1
The above sample generation process can be interpreted as two probabilistic model expressions as follows:
P(lXt,uXr)=P(lXt)P(uXr|lXt) Equation 2
Establishing a relation between the marked sample and the unmarked sample according to the Euclidean distance,
In the formula, lxt,dA d-th feature representing a t-th labeled sample; uxr,dRepresenting the d-th feature of the r-th unlabeled sample.
And then, designing a likelihood function according to the data model established above by using the widely-used probability latent semantic analysis PLSA method, and obtaining the maximum likelihood estimation value by the expectation-maximization EM method.
According to the idea of the PLSA method, a log likelihood function is established, and then the function is gradually optimized to obtain the optimal P (z)k)、P(lXt|zk) And P (uX)r|zk). The method comprises the following specific steps:
(1) establishing a likelihood function logP (S)labeled,Sunlabeled,Z)。
According to Bayes' formula, conditional probability
If it is established, from equations (1), (2) and (3), then SlabeledAnd SunlabeledThe joint probability density function between can be further modified as:
The above equation can be viewed as being applied to all hidden variables zkThe edge probability function obtained under the condition can be derived according to the conditional probability formula 5 as follows:
Thus, a likelihood function formula on all samples is obtained:
Wherein, thetatrIs an unlabeled sample uXiAnd labeled sample lXtThe distance of (d); k' is the number of hidden variables; w + g is the number of labeled samples contained in each training sample subset; v is contained in each subset of training samplesThe number of unlabeled samples;
(2) maximizing log likelihood function
Maximizing the log likelihood function is iterative optimization of function equation 8 using the EM method, and E is the calculation of P (z) using labeled and unlabeled samplesk|lXt,uXr) (ii) a M step is carried out with P (Z) thus obtainedk|lXt,uXr) Value calculation conditional probability P (lX)t|zk) And P (uX)r|zk) The maximum likelihood value is obtained.
Iteratively updating formulas 9 to 12 to enable the likelihood function f of the formula 8 to obtain a maximum value, wherein the loop termination condition can be that the maximum iteration times are reached or the algorithm meets the convergence condition;
the obtained optimal solution can be used to obtain a hidden feature value which can be regarded as a marked sample IXtThe new features of (1). lXtIs defined as shown in formula 13, which is described as a labeled sample lXtAnd a hidden variable zkA posterior probability of (d);
A new marked sampleFrom the sum of the eigenvalues in the original eigenspaceNew productRaw eigenvalue composition, expressed as: <math><mrow>
<mi>l</mi>
<msub>
<mover>
<mi>X</mi>
<mo>~</mo>
</mover>
<mi>t</mi>
</msub>
<mo>=</mo>
<mo>{</mo>
<msub>
<mi>lx</mi>
<mrow>
<mi>t</mi>
<mo>,</mo>
<mn>1</mn>
</mrow>
</msub>
<mo>,</mo>
<msub>
<mi>lx</mi>
<mrow>
<mi>t</mi>
<mo>,</mo>
<mn>2</mn>
</mrow>
</msub>
<mo>,</mo>
<mo>.</mo>
<mo>.</mo>
<mo>.</mo>
<mo>,</mo>
<msub>
<mi>lx</mi>
<mrow>
<mi>t</mi>
<mo>,</mo>
<mi>D</mi>
</mrow>
</msub>
<mo>,</mo>
<mi>P</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>z</mi>
<mn>1</mn>
</msub>
<mo>|</mo>
<msub>
<mi>lX</mi>
<mi>t</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>,</mo>
<mo>.</mo>
<mo>.</mo>
<mo>.</mo>
<mo>,</mo>
<mi>P</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>z</mi>
<msup>
<mi>K</mi>
<mrow>
<mo>′</mo>
<mo>′</mo>
</mrow>
</msup>
</msub>
<mo>|</mo>
<msub>
<mi>lX</mi>
<mi>t</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>}</mo>
<mo>,</mo>
<mi>t</mi>
<mo>=</mo>
<mn>1,2</mn>
<mo>,</mo>
<mo>.</mo>
<mo>.</mo>
<mo>.</mo>
<mo>,</mo>
<mi>w</mi>
<mo>+</mo>
<mi>g</mi>
<mo>,</mo>
</mrow></math> a new subset of labeled samples can then be obtained <math><mrow>
<msubsup>
<mover>
<mi>S</mi>
<mo>~</mo>
</mover>
<mi>labeled</mi>
<mi>h</mi>
</msubsup>
<mo>=</mo>
<msubsup>
<mrow>
<mo>{</mo>
<msub>
<mover>
<mi>lX</mi>
<mo>~</mo>
</mover>
<mi>t</mi>
</msub>
<mo>}</mo>
</mrow>
<mrow>
<mi>t</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mrow>
<mi>w</mi>
<mo>+</mo>
<mi>g</mi>
</mrow>
</msubsup>
<mo>,</mo>
<mi>h</mi>
<mo>=</mo>
<mn>1</mn>
<mo>,</mo>
<mo>.</mo>
<mo>.</mo>
<mo>.</mo>
<mo>,</mo>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mrow>
<mi>K</mi>
<mo>-</mo>
<mn>1</mn>
</mrow>
</munderover>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>j</mi>
<mo>=</mo>
<mi>i</mi>
<mo>+</mo>
<mn>1</mn>
</mrow>
<mi>K</mi>
</munderover>
<msub>
<mi>M</mi>
<mi>i</mi>
</msub>
<mo>×</mo>
<msub>
<mi>M</mi>
<mi>j</mi>
</msub>
<mo>;</mo>
</mrow></math>
And 4, step 4: and (5) testing the feature space conversion of the sample.
Since the feature space of the training samples has changed, the test sample feature space should also map into the same feature space of the training samples. The test stage still uses the test sample tXaAnd a hidden variable zkAs a new feature of the test sample, as shown in equation 16.
In the marked sample set SlabeledN neighboring samples obtained in (1), the hidden property zkDirectly from the training phase. It is clear that the test sample tXaAnd a hidden variable zkAre mutually communicatedIndependent of each otherThen P (tX)a|zk) Can be split into two parts: p (tX)a| Ω) and P (Ω | z)k) (ii) a Using the nearest neighbor matrix omega and the test sample tXaIs expressed as the Euclidean distance of P (tX)aΩ) because Ω belongs to the original tagged sample subset SlabeledThen P (tX) may be addeda| Ω) to P (tX)a|lXt) Let P (Ω | z)k) Change to P (lX)t|zk) And lXtE.g. omega. Test sample tXaThe new feature of (a) yields equation 14 to redefine as shown in equation 15.
And 5: and (4) modular integration. And (3) predicting labels for the test samples by using the classifier obtained in the training stage, and integrating the prediction results of all the base classifiers by using a Min-Max rule to obtain a solution of the original problem.
Has the advantages that:
1. the method applies a generative semi-supervised learning method fSSL to an M3 network, adds a part of unlabeled samples in each two types of labeled training sample subsets obtained after decomposition, and constructs new characteristics by using hidden variables between the labeled samples and the unlabeled samples according to the concept of fSSL.
2. The method can obtain a new marked training sample subset, obtain a corresponding base classifier model, and integrate the results of all base classifiers by using a Min-Max rule to obtain the solution of the original problem.
3. The invention can effectively process large-scale data and fully utilize unmarked samples, thereby reducing the cost of obtaining marked samples and avoiding the problem of unstable learning in unsupervised learning.
4. The invention well widens the application field of the minimum and maximum modular network M3.
Drawings
FIG. 1 shows a schematic view of aFor each of the present inventionIndependent of each otherOf the subset of training samplesDrawing (A)。
FIG. 2Generation of data models for labeled and unlabeled samples of the inventionDrawing (A)。
FIG. 3For the invention the stage posterior probability P (z) is testedk|tXa) Generating a model ofDrawing (A)。
FIG. 4Is a process flow of the present inventionDrawing (A)。
Detailed Description
The following description is incorporated inDrawingsThe invention will be described in further detail.
As shown in fig. 4As shown, the invention adds a part of unlabeled samples in the labeled sample subset generated in the sample decomposition stage of the minimum and maximum modular network (i.e. M3 network), and generates the hidden characteristics containing the unlabeled sample information according to the principle of generative semi-supervised learning fSSLAnd the characteristic is used as a new characteristic of the marked sample, so that the semi-supervised learning of the M3 network is realized. The method solves the problem that large-scale samples are difficult to mark, and avoids the problem of unstable learning in unsupervised learning, thereby enhancing the learning performance of the M3 network. The method comprises the following concrete steps:
step 1: and (4) dividing data.
Assume a labeled sample set aslXl=(lxl,1,lxl,2,...,lxl,D)∈RDL is the number of samples, D is the feature dimension, YlIs the category to which the ith labeled sample belongs. Through a one-to-one strategy, the M3 network first has a set S of labeled samplesLDivided into two categories S of K (K-1)/2ij(assume the C thiClass is "positive class", CjClass is "negative class"). And decomposing the two types of problems into a plurality of balanced two types of sub-problems by utilizing a 'part-to-part' decomposition strategy. Then it can be decomposed to get a K-class problemTwo kinds of sub-problems of balancep∈[1,Mi]q∈[1,Mj],i,j∈[1,K],i≠j,MiIs the CiNumber of blocks into which class samples are divided, MjIs the CjNumber of blocks into which class samples are divided. Suppose each of the two categories of sub-problemsTraining sample subset ofIn which w belong to CiClass sample and g belong to CjA sample of the class. The subset of labeled samples may be described as,
Assume an unlabeled sample set SU={uX1,uX2,...,uXU},(r ═ 1, 2.., U) for the r-th unlabeled sampleThe method is as follows. Equally dividing the original unlabeled sample into sample subsets with the same number of labeled samples, i.e. each unlabeled sample setTherein is provided withAnd (4) sampling.
Step 2: assignment of unlabeled sample subset.
The center point of each sample subset is calculated and the unlabeled samples are assigned to the labeled sample subset that is farthest from its center point. At this time, eachIndependent of each otherThe training sample subset of (1) comprises two parts, one part is the original marked class two sample subset Slabeled(w are of the C-thiClass (positive class) samples and g belong to the C-thjClass (negative class) samples) and unlabeled sample subset Sunlabeled(v unlabeled samples, wherein). Fusion process between unlabeled and labeled samplesAs shown in figure 1As shown in the drawings, the above-described,in the drawings"+" represents a positive type training sample with a labeled sample, "-" represents a negative type training sample with a labeled sample, and "●" represents an unlabeled sample.
And step 3: hidden feature generation.
Marked sample lXtWith unlabeled sample uXrThere must be some potential relationship between them, called hidden variable (denoted as z)k). IX estimation Using Euclidean distancetAnd uXrThe relationship between the two components is determined,as shown in Table 1As shown. The Euclidean distance formula (1) gives a marked sample LXtAnd unlabeled sample uXrThe Euclidean distance between them is calculated by thetatrAnd (4) showing.
Wherein, lxt,dA d-th feature representing a t-th labeled sample; uxr,dRepresenting the d-th feature of the r-th unlabeled sample.
TABLE 1: correlation between labeled and unlabeled samples
Establishing a data model to reflect lXt、uXrAnd zkThe mutual relationship among the three components is that,as shown in fig. 2Shown and assuming that both marked and unmarked samples are generated by the model, and that the model also assumes that there must be some hidden variable z behind the marked and unmarked samples1,z2,...,zK″The method is used for determining the whole sample generation process and comprises the following specific steps:
1) from all hidden variables with probability P (z)k) Select zk;
2) Given hidden variable zkWith a conditional probability P (lX)t|zk) Generating labeled samples lXt;
3) Given hidden variable zkWith conditional probability P (uX)r|zk) Generation of unlabeled sample uXr。
Note that from the above three steps, it can be seen that there is a marked sample lXtIs generated only with the hidden variable zk(ii) related, independent of unlabeled sample; likewise, unlabeled sample uXrIs also generated only with the hidden variable zkAnd is independent of the marked sample. The invention therefore considers the marked sample lX generated by this modeltWith unlabeled sample uXrAre mutuallyIndependent of each otherIn (1). Then according to the conditions in probability theoryIndependent of each otherThe following equation holds true:
P(lXt,uXr|zk)=P(lXt|zk)P(uXt|zk) Equation 2
In addition, according toTABLE 1Created labeled sample subset SlabeledWith unlabeled sample subset SunlabeledThe above sample generation process can be interpreted as two probabilistic model expressions as follows:
P(lXt,uXr)=P(lXt)P(uXr|lXt) Equation 3
A likelihood function is designed according to the data model established by the probability latent semantic analysis PLSA method, and the maximum likelihood estimation value is obtained by the expectation maximization EM method and can reflect the relation between the marked samples and the hidden characteristics. The method updates the value of the hidden variable through iteration, generally reaches the convergence condition of the algorithm or reaches the maximum iteration number to terminate the loop, and generally comprises two steps of calculating expectation (E-step) and maximization (M-step).
E-step: calculating a maximum likelihood estimation value of the hidden variable by using the existing estimation value of the hidden variable;
m-step: the maximum likelihood found at step E is maximized to calculate the desired estimate of the hidden variable.
According to the invention, a log likelihood function is established according to a PLSA method, and then the function is gradually optimized to obtain the optimal P (z)k)、P(lXt|zk) And P (uX)r|zk)。
(1) Establishing a likelihood function logP (S)labeled,Sunlabeled,Z)。
According to Bayes' formula, conditional probability
This is true. From equations (2), (3), (4), SlabeledAnd SunlabeledThe joint probability density function between can be further rewritten as:
The above equation can be viewed as being applied to all hidden variables zkAnd (4) edge probability function obtained under the condition. The following can be derived from conditional probability equation (5):
To this end, the likelihood function formula over all samples can be derived:
Wherein, thetatrIs an unlabeled sample uXiWith the labelled sample lXtThe distance of (d); k' is the number of hidden variables; w + g is the number of labeled samples contained in each training sample subset;is the number of unlabeled samples contained in each subset of training samples.
(2) Maximizing log likelihood function
Maximizing the log likelihood function is iterative optimization of the function (8) using the EM method, and E is a step of calculating P (z) using labeled and unlabeled samplesk|lXt,uXr) (ii) a M step is carried out using the resulting P (z)k|lXt,uXr) Value calculation conditional probability P (lX)t|zk) And P (uX)r|zk) The maximum likelihood value is obtained.
And (4) iteratively updating (9) to (12) so that the likelihood function f of (8) obtains a maximum value, and the loop termination condition can be that the maximum iteration number is reached or that the algorithm meets the convergence condition.
The obtained optimal solution can be used to obtain a hidden feature value which can be regarded as a marked sample IXtThe new features of (1). lXtIs defined as formula 13, which is described as a labeled sample lXtAnd a hidden variable zkThe posterior probability of (d).
A new marked sampleFrom the sum of the eigenvalues in the original eigenspaceNew productRaw eigenvalue composition, expressed as: <math><mrow>
<mi>l</mi>
<msub>
<mover>
<mi>X</mi>
<mo>~</mo>
</mover>
<mi>t</mi>
</msub>
<mo>=</mo>
<mo>{</mo>
<msub>
<mi>lx</mi>
<mrow>
<mi>t</mi>
<mo>,</mo>
<mn>1</mn>
</mrow>
</msub>
<mo>,</mo>
<msub>
<mi>lx</mi>
<mrow>
<mi>t</mi>
<mo>,</mo>
<mn>2</mn>
</mrow>
</msub>
<mo>,</mo>
<mo>.</mo>
<mo>.</mo>
<mo>.</mo>
<mo>,</mo>
<msub>
<mi>lx</mi>
<mrow>
<mi>t</mi>
<mo>,</mo>
<mi>D</mi>
</mrow>
</msub>
<mo>,</mo>
<mi>P</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>z</mi>
<mn>1</mn>
</msub>
<mo>|</mo>
<msub>
<mi>lX</mi>
<mi>t</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>,</mo>
<mo>.</mo>
<mo>.</mo>
<mo>.</mo>
<mo>,</mo>
<mi>P</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>z</mi>
<msup>
<mi>K</mi>
<mrow>
<mo>′</mo>
<mo>′</mo>
</mrow>
</msup>
</msub>
<mo>|</mo>
<msub>
<mi>lX</mi>
<mi>t</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>}</mo>
<mo>,</mo>
<mi>t</mi>
<mo>=</mo>
<mn>1,2</mn>
<mo>,</mo>
<mo>.</mo>
<mo>.</mo>
<mo>.</mo>
<mo>,</mo>
<mi>w</mi>
<mo>+</mo>
<mi>g</mi>
<mo>,</mo>
</mrow></math> a new subset of labeled samples can then be obtained <math><mrow>
<msubsup>
<mover>
<mi>S</mi>
<mo>~</mo>
</mover>
<mi>labeled</mi>
<mi>h</mi>
</msubsup>
<mo>=</mo>
<msubsup>
<mrow>
<mo>{</mo>
<msub>
<mover>
<mi>lX</mi>
<mo>~</mo>
</mover>
<mi>t</mi>
</msub>
<mo>}</mo>
</mrow>
<mrow>
<mi>t</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mrow>
<mi>w</mi>
<mo>+</mo>
<mi>g</mi>
</mrow>
</msubsup>
<mo>,</mo>
<mi>h</mi>
<mo>=</mo>
<mn>1</mn>
<mo>,</mo>
<mo>.</mo>
<mo>.</mo>
<mo>.</mo>
<mo>,</mo>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mrow>
<mi>K</mi>
<mo>-</mo>
<mn>1</mn>
</mrow>
</munderover>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>j</mi>
<mo>=</mo>
<mi>i</mi>
<mo>+</mo>
<mn>1</mn>
</mrow>
<mi>K</mi>
</munderover>
<msub>
<mi>M</mi>
<mi>i</mi>
</msub>
<mo>×</mo>
<msub>
<mi>M</mi>
<mi>j</mi>
</msub>
<mo>;</mo>
</mrow></math>
And 4, converting the feature space of the test sample.
Due to the fact that the training samples have characteristic changes, the test samples should be subjected to corresponding characteristic changes, and therefore the classification effect can be guaranteed. Of course, the test phase still employs the test sample tXaAnd a hidden variable zkAs a new feature of the test sample, as shown in equation 14.
But to reduce the time overhead, the new feature generation process in the test sample is slightly different from that in the training phase. The invention is based onAs shown in fig. 3The new feature shown generates a model in which the matrix Ω is represented for each test sample tXaIn the original marked sample set SlabeledN neighbor samples obtained in (1), hidden variable zkDirectly from the training phase. It is clear that the test sample tXaAnd a hidden variable zkAre mutually communicatedIndependent of each otherThen P (tX)a|zk) Can be split into two parts: p (tX)a| Ω) and P (Ω | z)k) (ii) a Using the nearest neighbor matrix omega and the test sample tXaIs expressed as the Euclidean distance of P (tX)aOmega) because omega belongs to the original marked sample set SlabeledThen P (tX) may be addeda| Ω) to P (tX)a|lXt) Let P (Ω | z)k) Change to P (lX)t|zk) And lXtE.g. omega. Then P (tX)a|zk)=P(tXa|Ω)P(Ω|zk)=P(tXa|lXt)P(lXt|zk). Test sample tXaThe new feature generation equation 14 may be redefined as shown in equation 15.
And 5, modularizing and integrating.
MIN, MAX and INV units are three integrated units in the M3 network, and they play an important role in the M3 network, eachIndependent of each otherThe sub-modules of (1) are combined by the three modules.
MIN unit: let I1,I2,...,InFor n input values, OminIs MIN cell output value, then Omin=MIN{I1,I2,...,Ii,...,InThe output result of the MIN unit is the minimum of all input values.
MAX unit: let I1,I2,...,InFor n input values, OmaxIs the MAX cell output value, then Omax=MAX{I1,I2,...,Ii,...,In}. This is in contrast to what would be expected from the MIN unit, the output O of the MAX unitmaxIs the maximum of all input values.
INV unit: the inverse operation is taken. In the framework of M3, the INV unit functions like the inversion of a matrix in order to avoid the repetition of generating multiple modules with the same training samples.
M3 network utilizes MIN rule and MAX rule for eachIndependent of each otherThe sub-modules of (a) are integrated together:
min rule: for eachIndependent of each otherThe training sample set of the submodule takes the minimum value O if the submodule has the same positive training sample and different negative training samplesmin;
Max rule: for eachIndependent of each otherThe training sample set of the submodule takes the maximum value O if it has the same negative training sample and different positive training samplesmax。
In conjunction with the above two rules, the present invention bases the new subset of labeled samples And combining the base classifiers obtained by training by utilizing a Min-Max rule. Using only M for a class two problemiOne MIN unit and one MAX unit. The formula is described as follows:
(1) the base classifier obtained by training based on the sample subset with the same class-one training sample firstly passes through the MIN unit and adopts Min rule integration:
Wherein,are each provided with the same positive type training sampleIndependent of each otherAnd the submodule outputs the minimum value after adopting a Min rule.Is a sub-problem of two typesAnd learning the classification result of the obtained base classifier for the sample x.
(2) Then, the Max rule integration is adopted by the MAX unit:
Wherein, Oij(x) Is a second kind of problem SijThe final integration result.
Claims (6)
1. A semi-supervised minimum and maximum modular pattern classification method is characterized in that unlabelled samples are added into a labeled sample subset obtained in an M3 network task decomposition stage, a hidden variable of a data generation model is utilized to link the labeled samples and the unlabelled samples, the posterior probability of the hidden variable and the labeled samples is used as a new feature of the labeled samples, and then a Min-Max integration rule is utilized to obtain a solution of an original problem, and the method comprises the following steps:
step 1: dividing data;
dividing an original marked sample set according to a task decomposition principle of an M3 network; equally dividing the unmarked sample set, wherein the number of the sample subsets is the same as that of the marked sample subsets;
step 2: allocation of unlabeled sample subsets;
adding the unmarked sample subset into the marked sample subset of the second type without repetition according to the principle that the central point of the subset is farthest; at this time, each independent training sample subset comprises two parts, one part is a marked sample subset of the second type, and the other part is an unmarked sample subset;
and step 3: hidden feature generation;
for each training subset, it is assumed that the labeled samples and the unlabeled samples are generated from the same generative model, and the labeled samples and the unlabeled samples are generated by the hidden variable zkDetermining; solving hidden variables by using a probability latent semantic analysis PLSA method, and taking the solved hidden variables and the posterior probability of the marked samples as new characteristics of the marked samples; training a classifier on the marked sample subset after the new features are added;
and 4, step 4: converting the feature space of the test sample;
if the feature space of the training sample is changed, the feature space of the test sample is mapped to the same feature space as the training sample, and the posterior probability values of the hidden variables and the test sample are estimated to be used as new features of the test sample by extracting n labeled samples of the nearest neighbor of each test sample in the training subset;
and 5: modularization integration is carried out;
and (3) taking the classifier obtained in the training stage as a test sample prediction label, and integrating prediction results of all base classifiers by using a Min-Max rule to obtain a solution of the original problem.
2. A semi-supervised min-max modular pattern classification method as claimed in claim 1, wherein: the method combines an M3 network and semi-supervised learning, and comprises the following steps:
step 1: dividing data;
the original marked sample set SLDividing the sample into a plurality of samples according to a hyperplane division method based on a sample division principle of an M3 networkA subset of labelled samples, MiAnd MjRespectively represent SLMiddle CiClass and CjThe number of blocks into which the class sample is divided; equally dividing the unmarked sample set, wherein the number of the sample subsets is the same as that of the marked sample subsets;
step 2: allocation of unlabeled sample subsets;
calculating the center point of each sample subset, and distributing the unmarked samples to the marked sample subsets which are farthest away from the center point of each sample subset; at this time, each independent training sample subset comprises two parts, one part is a marked sample subset of the second type, and the other part is an unmarked sample subset;
and step 3: hidden feature generation;
training the subset of labeled samples with SlabeledIndicating, unlabeled sample subset by SunlabeledRepresents; assuming that both marked and unmarked samples are generated by a generative model, and there are some hidden variables z behind the marked and unmarked samples1,z2,...,zK″For deciding on the whole sample generation process, comprising:
1) from all hidden variables with probability P (z)k) Select zk;
2) Given hidden variable zkWith a conditional probability P (lX)t|zk) Generating labeled samples lXt;
3) Given hidden variable zkWith conditional probability P (uX)r|zk) Generation of unlabeled sample uXr;
Marked sample lXtIs generated only with the hidden variable zk(ii) related, independent of unlabeled sample; likewise, unlabeled sample uXrIs also generated only with the hidden variable zkRelated, independent of marked samples(ii) a Marked sample lX generated by the modeltWith unlabeled sample uXrAre independent of each other; according to the condition independent property in the probability theory, the following equation holds:
P(lXt,uXr|zk)=P(lXt|zk)P(uXt|zk) Equation 1
The above sample generation process can be interpreted as two probabilistic model expressions as follows:
P(lXt,uXr)=P(lXt)P(uXr|lXt) Equation 2
Establishing a relation between the marked sample and the unmarked sample according to the Euclidean distance,
lxt,dA d-th feature representing a t-th labeled sample; uxr,dRepresents the d-th feature of the r-th unlabeled sample;
designing a likelihood function according to the established data model by utilizing a probability latent semantic analysis PLSA method, and obtaining a maximum likelihood estimation value by an expectation maximization EM method;
according to the idea of the PLSA method, a log likelihood function is established, and then the function is gradually optimized to obtain the optimal P (z)k)、P(lXt|zk) And P (uX)r|zk) The method comprises the following steps:
(1) establishing a likelihood function log P (S)labeled,Sunlabeled,Z);
According to Bayes' formula, conditional probability
If it is established, from equations (1), (2) and (3), then SlabeledAnd SunlabeledThe joint probability density function between can be further modified as:
The above equation can be viewed as being applied to all hidden variables zkThe edge probability function obtained under the condition can be derived according to the conditional probability formula 5 as follows:
Thus, a likelihood function formula on all samples is obtained:
Wherein, thetatrIs an unlabeled sample uXiAnd labeled sample lXtThe distance of (d); k' is the number of hidden variables; w + g is the number of labeled samples contained in each training sample subset; ν is the number of unlabeled samples contained in each subset of training samples;
(2) maximizing log likelihood function
Maximizing the log likelihood function is iterative optimization of function equation 8 using the EM method, and E is the calculation of P (z) using labeled and unlabeled samplesk|lXt,uXr) (ii) a M step is carried out using the resulting P (z)k|lXt,uXr) Value calculation conditional probability P (lX)t|zk) And P (uX)r|zk) Obtaining a maximum likelihood value;
Iteratively updating formulas 9 to 12 to enable the likelihood function f of the formula 8 to obtain a maximum value, wherein the loop termination condition can be that the maximum iteration times are reached or the algorithm meets the convergence condition;
the obtained optimal solution can be used to obtain a hidden feature value which can be regarded as a marked sample IXtNew characteristic of (2), lXtIs defined as shown in formula 13, which is described as a labeled sample lXtAnd a hidden variable zkA posterior probability of (d);
A new marked sampleConsists of eigenvalues in the original eigenspace and newly generated eigenvalues, expressed as: <math>
<mrow>
<mi>l</mi>
<msub>
<mover>
<mi>X</mi>
<mo>~</mo>
</mover>
<mi>t</mi>
</msub>
<mo>=</mo>
<mo>{</mo>
<mi>l</mi>
<msub>
<mi>x</mi>
<mrow>
<mi>t</mi>
<mo>,</mo>
<mn>1</mn>
</mrow>
</msub>
<mo>,</mo>
<mi>l</mi>
<msub>
<mi>x</mi>
<mrow>
<mi>t</mi>
<mo>,</mo>
<mn>2</mn>
</mrow>
</msub>
<mo>,</mo>
<mo>.</mo>
<mo>.</mo>
<mo>,</mo>
<mi>l</mi>
<msub>
<mi>x</mi>
<mrow>
<mi>t</mi>
<mo>,</mo>
<mi>D</mi>
</mrow>
</msub>
<mo>,</mo>
<mi>P</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>z</mi>
<mn>1</mn>
</msub>
<mo>|</mo>
<mi>l</mi>
<msub>
<mi>X</mi>
<mi>t</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>,</mo>
<mo>.</mo>
<mo>.</mo>
<mo>.</mo>
<mo>,</mo>
<mi>P</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>z</mi>
<msup>
<mi>K</mi>
<mrow>
<mo>′</mo>
<mo>′</mo>
</mrow>
</msup>
</msub>
<mo>|</mo>
<mi>l</mi>
<msub>
<mi>X</mi>
<mi>t</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>}</mo>
<mo>,</mo>
<mi>t</mi>
<mo>=</mo>
<mn>1,2</mn>
<mo>,</mo>
<mo>.</mo>
<mo>.</mo>
<mo>.</mo>
<mo>,</mo>
<mi>w</mi>
<mo>+</mo>
<mi>g</mi>
<mo>,</mo>
</mrow>
</math> a new subset of labeled samples can then be obtained <math>
<mrow>
<msubsup>
<mover>
<mi>S</mi>
<mo>~</mo>
</mover>
<mi>labeled</mi>
<mi>h</mi>
</msubsup>
<mo>=</mo>
<msubsup>
<mrow>
<mo>{</mo>
<mi>l</mi>
<msub>
<mover>
<mi>X</mi>
<mo>~</mo>
</mover>
<mi>t</mi>
</msub>
<mo>}</mo>
</mrow>
<mrow>
<mi>t</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mrow>
<mi>w</mi>
<mo>+</mo>
<mi>g</mi>
</mrow>
</msubsup>
<mo>,</mo>
<mi>h</mi>
<mo>=</mo>
<mn>1</mn>
<mo>,</mo>
<mo>.</mo>
<mo>.</mo>
<mo>.</mo>
<mo>,</mo>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mrow>
<mi>K</mi>
<mo>-</mo>
<mn>1</mn>
</mrow>
</munderover>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>j</mi>
<mo>=</mo>
<mi>i</mi>
<mo>+</mo>
<mn>1</mn>
</mrow>
<mi>K</mi>
</munderover>
<msub>
<mi>M</mi>
<mi>i</mi>
</msub>
<mo>×</mo>
<msub>
<mi>M</mi>
<mi>j</mi>
</msub>
<mo>;</mo>
</mrow>
</math>
And 4, step 4: converting the feature space of the test sample;
because the feature space of the training sample is changed, the feature space of the test sample should be mapped to the same feature space of the training sample, and the test sample tX is still used in the test stageaAnd a hidden variable zkAs test samplesNew features, as shown in equation 16;
In the marked sample set SlabeledN neighbor samples obtained in (1), hidden variable zkDirectly from the training phase; test sample tXaAnd a hidden variable zkIndependent of each other, then P (tX)a|zk) Can be split into two parts: p (tX)a| Ω) and P (Ω | z)k) (ii) a Using the nearest neighbor matrix omega and the test sample tXaIs expressed as the Euclidean distance of P (tX)aΩ) because Ω belongs to the marked sample subset SlabeledThen P (tX) may be addeda| Ω) to P (tX)a|lXt) Let P (Ω | z)k) Change to P (lX)t|zk) And lXtE.g. omega, test sample tXaThe new feature of (a) yields equation 14 to redefine as shown in equation 15;
And 5: modularization integration is carried out;
and (3) taking the classifier obtained in the training stage as a test sample prediction label, and integrating prediction results of all base classifiers by using a Min-Max rule to obtain a solution of the original problem.
3. A semi-supervised min-max modular pattern classification method as claimed in claim 1, wherein: the method integrates marked samples and unmarked samples according to generative semi-supervised learning, constructs new features by utilizing hidden characteristics between the marked samples and the unmarked samples, obtains a new marked sample set, and trains a classifier on the new marked sample set.
4. A semi-supervised min-max modular pattern classification method as claimed in claim 1, wherein: the method expands the application range of the minimum and maximum modular network.
5. A semi-supervised min-max modular pattern classification method as claimed in claim 1, wherein: the method is to train a classifier by using the labeled samples with the added features.
6. A semi-supervised min-max modular pattern classification method as claimed in claim 1 applied to min-max modular networks.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510035805.4A CN104657743A (en) | 2015-01-23 | 2015-01-23 | Semi-supervised minimum and maximum modularization mode classification method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510035805.4A CN104657743A (en) | 2015-01-23 | 2015-01-23 | Semi-supervised minimum and maximum modularization mode classification method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104657743A true CN104657743A (en) | 2015-05-27 |
Family
ID=53248842
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510035805.4A Pending CN104657743A (en) | 2015-01-23 | 2015-01-23 | Semi-supervised minimum and maximum modularization mode classification method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104657743A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108881196A (en) * | 2018-06-07 | 2018-11-23 | 中国民航大学 | The semi-supervised intrusion detection method of model is generated based on depth |
CN109831392A (en) * | 2019-03-04 | 2019-05-31 | 中国科学技术大学 | Semi-supervised net flow assorted method |
CN110347825A (en) * | 2019-06-14 | 2019-10-18 | 北京物资学院 | The short English film review classification method of one kind and device |
-
2015
- 2015-01-23 CN CN201510035805.4A patent/CN104657743A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108881196A (en) * | 2018-06-07 | 2018-11-23 | 中国民航大学 | The semi-supervised intrusion detection method of model is generated based on depth |
CN109831392A (en) * | 2019-03-04 | 2019-05-31 | 中国科学技术大学 | Semi-supervised net flow assorted method |
CN109831392B (en) * | 2019-03-04 | 2020-10-27 | 中国科学技术大学 | Semi-supervised network flow classification method |
CN110347825A (en) * | 2019-06-14 | 2019-10-18 | 北京物资学院 | The short English film review classification method of one kind and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11651273B2 (en) | Machine learning using partial order hypergraphs | |
CN109448703A (en) | In conjunction with the audio scene recognition method and system of deep neural network and topic model | |
CN104657743A (en) | Semi-supervised minimum and maximum modularization mode classification method | |
Zheng et al. | M-GWNN: Multi-granularity graph wavelet neural networks for semi-supervised node classification | |
Huhn et al. | Learning ergodic averages in chaotic systems | |
Wu et al. | EvoNet: A neural network for predicting the evolution of dynamic graphs | |
Yang et al. | Energy-efficient joint resource allocation algorithms for MEC-enabled emotional computing in urban communities | |
Chen et al. | Clustering activity–travel behavior time series using topological data analysis | |
CN115456093A (en) | High-performance graph clustering method based on attention-graph neural network | |
Hajiveiseh et al. | Deep asymmetric nonnegative matrix factorization for graph clustering | |
CN114626550A (en) | Distributed model collaborative training method and system | |
Granell et al. | Unsupervised clustering analysis: a multiscale complex networks approach | |
Moawad et al. | A deep learning approach for macroscopic energy consumption prediction with microscopic quality for electric vehicles | |
Jiang | Learning protein functions from bi-relational graph of proteins and function annotations | |
Kim et al. | Supernet training for federated image classification under system heterogeneity | |
CN114171138B (en) | Compound modeling-oriented set representation learning method | |
Mao et al. | Knowledge Structure-Aware Graph-Attention Networks for Knowledge Tracing | |
Ünal et al. | A new approach: semisupervised ordinal classification | |
CN115409170A (en) | Sample data generation and trip demand prediction model training and prediction method and device | |
Bhuvaneswari et al. | The study and analysis of classification algorithm for animal kingdom dataset | |
CN112465066A (en) | Graph classification method based on clique matching and hierarchical pooling | |
Qiao et al. | AutoTomo: Learning-Based Traffic Estimator Incorporating Network Tomography | |
CN117521898B (en) | Electric charge settlement optimization method and system based on association analysis | |
Weng et al. | Semi-supervised Graph Convolutional Neural Network Based Classification for Auto Parts Inventory Management | |
Grisiute et al. | Unlocking Urban Simulation Data with a Semantic City Planning System |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20150527 |
|
RJ01 | Rejection of invention patent application after publication |