CN104657743A

CN104657743A - Semi-supervised minimum and maximum modularization mode classification method

Info

Publication number: CN104657743A
Application number: CN201510035805.4A
Authority: CN
Inventors: 李云; 吴燕平; 冯丽丽
Original assignee: Nanjing Post and Telecommunication University
Current assignee: Nanjing Post and Telecommunication University
Priority date: 2015-01-23
Filing date: 2015-01-23
Publication date: 2015-05-27

Abstract

The invention discloses a semi-supervised minimum and maximum modularization mode classification method, and belongs to the technical field of data mining. According to the method, part of non-marked samples are added into a marked sample sub set obtained in a task decomposing stage of a minimum and maximum modularization network (namely an M3 network), and features containing non-marked sample information are generated according to the ideal of a generation type semi-supervised learning algorithm (namely fSSL) and are used as new features with marked samples, so that a semi-supervised M3 network is realized. According to the method, the problem that manpower and material resources are needed to be greatly consumed for marking a large scale of samples and the non-supervised learning is unstable are solved, and the learning property of the original M3 network is enhanced.

Description

Semi-supervised minimum and maximum modular mode classification method

Technical Field

The invention relates to a semi-supervised minimum and maximum modular pattern classification method, belonging to the technical field of data mining.

Background

In real life, the data volume of various industries is exponentially increased. According to the statistics of Internet Data Center (IDC), the global Data amount is increased at 40-60% every year, and is predicted to reach 35ZB (10) in 2020²¹Bytes). The value of the data is self-evident, and how to effectively utilize the data has attracted attention from a number of researchers.

The 1999 Lubao grain professor in "task decomposition and module combination based on category relationship: a Modular Network for pattern classification is provided, which is based on the Min-Max Modular Network (M3 Network for short) and is used for solving the problem of difficult classification of large-scale complex data, the core is to adopt the idea of divide-and-conquer method to decompose large-scale data into a plurality of small and simple modules for processing so as to reduce the complexity of the original problem, and each sub-module is mutually connectedIndependent of each otherIn operation, no set-up is required between modulesAny communication, it facilitates the parallelization operation of the actual task. And finally, combining the prediction results of each module through a Min-Max rule to obtain the solution of the original problem.

The M3 network is currently only a supervised learning approach. Supervised Learning (Supervised Learning) means that the class to which a sample belongs is known, with the goal of being based on a given set of training samples Search for sample X_lAnd mark Y_lAnd testing the quality of the mapping relation through a new test sample. Supervised learning requires that the class of all training samples must be known and that it requires a large number of labeled samples to achieve efficient generalization performance. However, in the practical problem, the marked sample and the unmarked sample are usually coexisted, and the marked sample is obtained by taking a lot of labor, even professional knowledge in a certain field is required as a support, while the unmarked sample is easily available. Unsupervised Learning (Unsupervised Learning) generally constructs different Learning models by using the internal relation between these unlabeled samples, and is different from supervised Learning essentially in that the class to which the sample belongs is unknown, and it cannot directly obtain the sample X_lAnd mark Y_lThe mapping relationship between them. In view of the deficiencies of both, some researchers have proposed semi-supervised learning methods.

Generative Semi-Supervised Learning (fSSL) is one type of Semi-Supervised Learning. The generative semi-supervised learning is expressed as: training sample set S' ═ { X ═ X₁′,X₂′,...,X_L′}， Where D represents the number of features of the original marked sample and K' represents the number of hidden variables. It is clear that the number of features used to describe the sample is increasing, but trainingThe number of training samples remains unchanged. The present invention can solve the above problems well.

Disclosure of Invention

The invention aims to solve the problems that large-scale sample marking needs to spend a large amount of manpower and material resources, learning instability exists in unsupervised learning, and an existing M3 network can only be used for supervised learning, and provides a semi-supervised minimum and maximum modular mode classification method, which comprises the following steps: (1) dividing a marked sample set according to a task division principle of an M3 network, dividing an unmarked sample set into the same number of blocks, and adding the unmarked sample subset into a marked sample subset; (2) closely connecting the marked sample with the unmarked sample by using the similarity matrix as a data model; (3) applying a Probability Latent Semantic Analysis (PLSA) model in the similarity matrix to obtain hidden variables between marked samples and unmarked samples; (4) the posterior probability of the marked sample and the hidden variable is used as a new feature of the marked sample, and the posterior probability of the test sample and the hidden variable is used as a new feature of the marked sample; (5) and integrating the results of the base classifiers by using a Min-Max rule to obtain the solution of the original two types of problems.

The technical scheme adopted by the invention for solving the technical problems is as follows: a semi-supervised minimum and maximum modular pattern classification method combines an M3 network of a semi-supervised learning idea and combines semi-supervised learning with an M3 network, thereby not only enhancing the learning performance of an original M3 network, but also effectively utilizing a large number of existing unlabelled samples.

The method comprises the following steps:

the method divides a marked sample set and an unmarked sample set into sample subsets according to a selected dividing method, and adds the unmarked sample subsets into the marked sample subsets without repetition according to a strategy of farthest distance of the central points of the subsets, thereby forming training subsets. And (3) solving hidden variables determining a data model generation process for generating marked samples and unmarked samples by utilizing a Probability Latent Semantic Analysis (PLSA) method for each training subset, and taking the hidden variables and the posterior probability of the marked samples as new features of the marked samples. The method mainly utilizes the marked samples added with the characteristics to train the classifier. And regarding the test sample, taking the posterior probability of the hidden variable and the test sample as a new feature, and using the classifier obtained in the training stage as a prediction label of the hidden variable and the test sample. And finally, integrating the results of the base classifiers by using a Min-Max rule to obtain the solution of the original two types of problems, wherein the method comprises the following specific steps:

step 1: and (4) dividing data.

The original marked sample set S_LDividing the task into the parts according to the task division principle of the M3 network and the hyperplane division methodA subset of labelled samples, N_iAnd N_jRespectively generation by generationWatch S _LMiddle C_iClass and C_jNumber of blocks into which class samples are divided. Equally dividing the unmarked sample set, wherein the number of the sample subsets is the same as that of the marked sample subsets;

step 2: assignment of unlabeled sample subset.

The center point of each sample subset is calculated and the unlabeled samples are assigned to the labeled sample subset that is farthest from its center point. At this time, eachIndependent of each otherThe training sample subset comprises two parts, one part is a marked sample subset of two types, and the other part is an unmarked sample subset;

and step 3: hidden feature generation.

To include a subset S of the marked samples_labeledAnd unlabeled sample subset S_unlabeledThe composed training subsets are examples. Suppose that both marked and unmarked samples are generated by a generative model and that some hidden variables z1, z2_K″For deciding the wholeThe sample generation process comprises the following steps:

1) from all hidden variables with probability P (z)_k) Select z_k；

2) Given hidden variable z_kWith a conditional probability P (lX)_t|z_k) Generating labeled samples lX_t；

3) Given hidden variable z_kWith conditional probability P (uX)_r|z_k) Generation of unlabeled sample uX_r。

Marked sample lX_tOnly with hidden variable z_k(ii) related, independent of unlabeled sample; likewise, unlabeled sample uX_rAlso only with the hidden variable z_k(ii) related, independent of the labeled sample; marked sample lX generated by the model_tWith unlabeled sample uX_rAre mutuallyIndependent of each otherOf (1); according to conditions in probability theoryIndependent of each otherThe following equation holds true:

P(lX_t,uX_r|zk)＝P(lX_t|z_k)P(uX_t|z_k) Equation 1

The above sample generation process can be interpreted as two probabilistic model expressions as follows:

P(lX_t,uX_r)＝P(lX_t)P(uX_r|lX_t) Equation 2

Equation 3

Establishing a relation between the marked sample and the unmarked sample according to the Euclidean distance,

equation 4

In the formula, lx_t,dA d-th feature representing a t-th labeled sample; ux_r,dRepresenting the d-th feature of the r-th unlabeled sample.

And then, designing a likelihood function according to the data model established above by using the widely-used probability latent semantic analysis PLSA method, and obtaining the maximum likelihood estimation value by the expectation-maximization EM method.

According to the idea of the PLSA method, a log likelihood function is established, and then the function is gradually optimized to obtain the optimal P (z)_k)、P(lX_t|z_k) And P (uX)_r|z_k). The method comprises the following specific steps:

(1) establishing a likelihood function logP (S)_labeled,S_unlabeled,Z)。

According to Bayes' formula, conditional probability

P (z_{k} | {lk}_{t}) = \frac{P ({lk}_{t} | z_{k}) P (z_{k})}{P ({lX}_{t})}

Equation 5

If it is established, from equations (1), (2) and (3), then S_labeledAnd S_unlabeledThe joint probability density function between can be further modified as:

equation 6

The above equation can be viewed as being applied to all hidden variables z_kThe edge probability function obtained under the condition can be derived according to the conditional probability formula 5 as follows:

equation 7

Thus, a likelihood function formula on all samples is obtained:

<math><mrow> <mi>f</mi> <mo>=</mo> <mi>log</mi> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>S</mi> <mi>labeled</mi> </msub> <mo>,</mo> <msub> <mi>S</mi> <mi>unlabeled</mi> </msub> <mo>,</mo> <mi>Z</mi> <mo>)</mo> </mrow> <mo>=</mo> <msubsup> <mi>Σ</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <msup> <mi>K</mi> <mrow> <mo>′</mo> <mo>′</mo> </mrow> </msup> </msubsup> <msubsup> <mi>Σ</mi> <mrow> <mi>t</mi> <mo>=</mo> <mn>4</mn> </mrow> <mrow> <mi>w</mi> <mo>+</mo> <mi>g</mi> </mrow> </msubsup> <msubsup> <mi>Σ</mi> <mrow> <mi>r</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>v</mi> </msubsup> <msub> <mi>θ</mi> <mi>tr</mi> </msub> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>lX</mi> <mi>t</mi> </msub> <mo>,</mo> <msub> <mi>uX</mi> <mi>r</mi> </msub> <mo>,</mo> <msub> <mi>z</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> </mrow></math>

equation 8

Wherein, theta_trIs an unlabeled sample uX_iAnd labeled sample lX_tThe distance of (d); k' is the number of hidden variables; w + g is the number of labeled samples contained in each training sample subset; v is contained in each subset of training samplesThe number of unlabeled samples;

(2) maximizing log likelihood function

Maximizing the log likelihood function is iterative optimization of function equation 8 using the EM method, and E is the calculation of P (z) using labeled and unlabeled samples_k|lX_t,uX_r) (ii) a M step is carried out with P (Z) thus obtained_k|lX_t,uX_r) Value calculation conditional probability P (lX)_t|z_k) And P (uX)_r|z_k) The maximum likelihood value is obtained.

Equation 9

Equation 10

Equation 11

Equation 12

Iteratively updating formulas 9 to 12 to enable the likelihood function f of the formula 8 to obtain a maximum value, wherein the loop termination condition can be that the maximum iteration times are reached or the algorithm meets the convergence condition;

the obtained optimal solution can be used to obtain a hidden feature value which can be regarded as a marked sample IX_tThe new features of (1). lX_tIs defined as shown in formula 13, which is described as a labeled sample lX_tAnd a hidden variable z_kA posterior probability of (d);

equation 13

A new marked sampleFrom the sum of the eigenvalues in the original eigenspaceNew productRaw eigenvalue composition, expressed as:

a new subset of labeled samples can then be obtained

<math><mrow> <msubsup> <mover> <mi>S</mi> <mo>~</mo> </mover> <mi>labeled</mi> <mi>h</mi> </msubsup> <mo>=</mo> <msubsup> <mrow> <mo>{</mo> <msub> <mover> <mi>lX</mi> <mo>~</mo> </mover> <mi>t</mi> </msub> <mo>}</mo> </mrow> <mrow> <mi>t</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mi>w</mi> <mo>+</mo> <mi>g</mi> </mrow> </msubsup> <mo>,</mo> <mi>h</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>,</mo> <munderover> <mi>Σ</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mi>K</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <munderover> <mi>Σ</mi> <mrow> <mi>j</mi> <mo>=</mo> <mi>i</mi> <mo>+</mo> <mn>1</mn> </mrow> <mi>K</mi> </munderover> <msub> <mi>M</mi> <mi>i</mi> </msub> <mo>×</mo> <msub> <mi>M</mi> <mi>j</mi> </msub> <mo>;</mo> </mrow></math>

And 4, step 4: and (5) testing the feature space conversion of the sample.

Since the feature space of the training samples has changed, the test sample feature space should also map into the same feature space of the training samples. The test stage still uses the test sample tX_aAnd a hidden variable z_kAs a new feature of the test sample, as shown in equation 16.

Equation 14

In the marked sample set S_labeledN neighboring samples obtained in (1), the hidden property z_kDirectly from the training phase. It is clear that the test sample tX_aAnd a hidden variable z_kAre mutually communicatedIndependent of each otherThen P (tX)_a|z_k) Can be split into two parts: p (tX)_a| Ω) and P (Ω | z)_k) (ii) a Using the nearest neighbor matrix omega and the test sample tX_aIs expressed as the Euclidean distance of P (tX)_aΩ) because Ω belongs to the original tagged sample subset S_labeledThen P (tX) may be added_a| Ω) to P (tX)_a|lX_t) Let P (Ω | z)_k) Change to P (lX)_t|z_k) And lX_tE.g. omega. Test sample tX_aThe new feature of (a) yields equation 14 to redefine as shown in equation 15.

Equation 15

And 5: and (4) modular integration. And (3) predicting labels for the test samples by using the classifier obtained in the training stage, and integrating the prediction results of all the base classifiers by using a Min-Max rule to obtain a solution of the original problem.

Has the advantages that:

1. the method applies a generative semi-supervised learning method fSSL to an M3 network, adds a part of unlabeled samples in each two types of labeled training sample subsets obtained after decomposition, and constructs new characteristics by using hidden variables between the labeled samples and the unlabeled samples according to the concept of fSSL.

2. The method can obtain a new marked training sample subset, obtain a corresponding base classifier model, and integrate the results of all base classifiers by using a Min-Max rule to obtain the solution of the original problem.

3. The invention can effectively process large-scale data and fully utilize unmarked samples, thereby reducing the cost of obtaining marked samples and avoiding the problem of unstable learning in unsupervised learning.

4. The invention well widens the application field of the minimum and maximum modular network M3.

Drawings

FIG. 1 shows a schematic view of aFor each of the present inventionIndependent of each otherOf the subset of training samplesDrawing (A)。

FIG. 2Generation of data models for labeled and unlabeled samples of the inventionDrawing (A)。

FIG. 3For the invention the stage posterior probability P (z) is tested_k|tX_a) Generating a model ofDrawing (A)。

FIG. 4Is a process flow of the present inventionDrawing (A)。

Detailed Description

The following description is incorporated inDrawingsThe invention will be described in further detail.

As shown in fig. 4As shown, the invention adds a part of unlabeled samples in the labeled sample subset generated in the sample decomposition stage of the minimum and maximum modular network (i.e. M3 network), and generates the hidden characteristics containing the unlabeled sample information according to the principle of generative semi-supervised learning fSSLAnd the characteristic is used as a new characteristic of the marked sample, so that the semi-supervised learning of the M3 network is realized. The method solves the problem that large-scale samples are difficult to mark, and avoids the problem of unstable learning in unsupervised learning, thereby enhancing the learning performance of the M3 network. The method comprises the following concrete steps:

step 1: and (4) dividing data.

Assume a labeled sample set aslX_l＝(lx_l,1,lx_l,2,...,lx_l,D)∈R^DL is the number of samples, D is the feature dimension, Y_lIs the category to which the ith labeled sample belongs. Through a one-to-one strategy, the M3 network first has a set S of labeled samples_LDivided into two categories S of K (K-1)/2_ij(assume the C th_iClass is "positive class", C_jClass is "negative class"). And decomposing the two types of problems into a plurality of balanced two types of sub-problems by utilizing a 'part-to-part' decomposition strategy. Then it can be decomposed to get a K-class problemTwo kinds of sub-problems of balancep∈[1,M_i]q∈[1,M_j]，i,j∈[1,K]，i≠j，M_iIs the C_iNumber of blocks into which class samples are divided, M_jIs the C_jNumber of blocks into which class samples are divided. Suppose each of the two categories of sub-problemsTraining sample subset ofIn which w belong to C_iClass sample and g belong to C_jA sample of the class. The subset of labeled samples may be described as，

S_{labeled}^{h} = {{lX}_{1}, {lX}_{2}, . . ., {lX}_{w}, {lX}_{(w + 1)}, . . ., {lX}_{(w + g)}}, {lX}_{t} = {{lx}_{t, 1}, {lx}_{t, 2}, . . ., {lx}_{t, D}}, t = 1,2, . . ., w + g .

Assume an unlabeled sample set S_U＝{uX₁,uX₂,...,uX_U}，(r ═ 1, 2.., U) for the r-th unlabeled sampleThe method is as follows. Equally dividing the original unlabeled sample into sample subsets with the same number of labeled samples, i.e. each unlabeled sample setTherein is provided withAnd (4) sampling.

Step 2: assignment of unlabeled sample subset.

The center point of each sample subset is calculated and the unlabeled samples are assigned to the labeled sample subset that is farthest from its center point. At this time, eachIndependent of each otherThe training sample subset of (1) comprises two parts, one part is the original marked class two sample subset S_labeled(w are of the C-th_iClass (positive class) samples and g belong to the C-th_jClass (negative class) samples) and unlabeled sample subset S_unlabeled(v unlabeled samples, wherein). Fusion process between unlabeled and labeled samplesAs shown in figure 1As shown in the drawings, the above-described,in the drawings"+" represents a positive type training sample with a labeled sample, "-" represents a negative type training sample with a labeled sample, and "●" represents an unlabeled sample.

And step 3: hidden feature generation.

Marked sample lX_tWith unlabeled sample uX_rThere must be some potential relationship between them, called hidden variable (denoted as z)_k). IX estimation Using Euclidean distance_tAnd uX_rThe relationship between the two components is determined,as shown in Table 1As shown. The Euclidean distance formula (1) gives a marked sample LX_tAnd unlabeled sample uX_rThe Euclidean distance between them is calculated by theta_trAnd (4) showing.

Equation 1

Wherein, lx_t,dA d-th feature representing a t-th labeled sample; ux_r,dRepresenting the d-th feature of the r-th unlabeled sample.

TABLE 1: correlation between labeled and unlabeled samples

Establishing a data model to reflect lX_t、uX_rAnd z_kThe mutual relationship among the three components is that,as shown in fig. 2Shown and assuming that both marked and unmarked samples are generated by the model, and that the model also assumes that there must be some hidden variable z behind the marked and unmarked samples₁,z₂,...,z_K″The method is used for determining the whole sample generation process and comprises the following specific steps:

1) from all hidden variables with probability P (z)_k) Select z_k；

Note that from the above three steps, it can be seen that there is a marked sample lX_tIs generated only with the hidden variable z_k(ii) related, independent of unlabeled sample; likewise, unlabeled sample uX_rIs also generated only with the hidden variable z_kAnd is independent of the marked sample. The invention therefore considers the marked sample lX generated by this model_tWith unlabeled sample uX_rAre mutuallyIndependent of each otherIn (1). Then according to the conditions in probability theoryIndependent of each otherThe following equation holds true:

P(lX_t,uX_r|z_k)＝P(lX_t|z_k)P(uX_t|z_k) Equation 2

In addition, according toTABLE 1Created labeled sample subset S_labeledWith unlabeled sample subset S_unlabeledThe above sample generation process can be interpreted as two probabilistic model expressions as follows:

P(lX_t,uX_r)＝P(lX_t)P(uX_r|lX_t) Equation 3

Equation 4

A likelihood function is designed according to the data model established by the probability latent semantic analysis PLSA method, and the maximum likelihood estimation value is obtained by the expectation maximization EM method and can reflect the relation between the marked samples and the hidden characteristics. The method updates the value of the hidden variable through iteration, generally reaches the convergence condition of the algorithm or reaches the maximum iteration number to terminate the loop, and generally comprises two steps of calculating expectation (E-step) and maximization (M-step).

E-step: calculating a maximum likelihood estimation value of the hidden variable by using the existing estimation value of the hidden variable;

m-step: the maximum likelihood found at step E is maximized to calculate the desired estimate of the hidden variable.

According to the invention, a log likelihood function is established according to a PLSA method, and then the function is gradually optimized to obtain the optimal P (z)_k)、P(lX_t|z_k) And P (uX)_r|z_k)。

(1) Establishing a likelihood function logP (S)_labeled,S_unlabeled,Z)。

According to Bayes' formula, conditional probability

P (z_{k} | {lk}_{t}) = \frac{P ({lk}_{t} | z_{k}) P (z_{k})}{P ({lX}_{t})}

Equation 5

This is true. From equations (2), (3), (4), S_labeledAnd S_unlabeledThe joint probability density function between can be further rewritten as:

equation 6

The above equation can be viewed as being applied to all hidden variables z_kAnd (4) edge probability function obtained under the condition. The following can be derived from conditional probability equation (5):

equation 7

To this end, the likelihood function formula over all samples can be derived:

equation 8

Wherein, theta_trIs an unlabeled sample uX_iWith the labelled sample lX_tThe distance of (d); k' is the number of hidden variables; w + g is the number of labeled samples contained in each training sample subset;is the number of unlabeled samples contained in each subset of training samples.

(2) Maximizing log likelihood function

Maximizing the log likelihood function is iterative optimization of the function (8) using the EM method, and E is a step of calculating P (z) using labeled and unlabeled samples_k|lX_t,uX_r) (ii) a M step is carried out using the resulting P (z)_k|lX_t,uX_r) Value calculation conditional probability P (lX)_t|z_k) And P (uX)_r|z_k) The maximum likelihood value is obtained.

Equation 9

Equation 10

Equation 11

Equation 12

And (4) iteratively updating (9) to (12) so that the likelihood function f of (8) obtains a maximum value, and the loop termination condition can be that the maximum iteration number is reached or that the algorithm meets the convergence condition.

The obtained optimal solution can be used to obtain a hidden feature value which can be regarded as a marked sample IX_tThe new features of (1). lX_tIs defined as formula 13, which is described as a labeled sample lX_tAnd a hidden variable z_kThe posterior probability of (d).

Equation 13

a new subset of labeled samples can then be obtained

And 4, converting the feature space of the test sample.

Due to the fact that the training samples have characteristic changes, the test samples should be subjected to corresponding characteristic changes, and therefore the classification effect can be guaranteed. Of course, the test phase still employs the test sample tX_aAnd a hidden variable z_kAs a new feature of the test sample, as shown in equation 14.

Equation 14

But to reduce the time overhead, the new feature generation process in the test sample is slightly different from that in the training phase. The invention is based onAs shown in fig. 3The new feature shown generates a model in which the matrix Ω is represented for each test sample tX_aIn the original marked sample set S_labeledN neighbor samples obtained in (1), hidden variable z_kDirectly from the training phase. It is clear that the test sample tX_aAnd a hidden variable z_kAre mutually communicatedIndependent of each otherThen P (tX)_a|z_k) Can be split into two parts: p (tX)_a| Ω) and P (Ω | z)_k) (ii) a Using the nearest neighbor matrix omega and the test sample tX_aIs expressed as the Euclidean distance of P (tX)_aOmega) because omega belongs to the original marked sample set S_labeledThen P (tX) may be added_a| Ω) to P (tX)_a|lX_t) Let P (Ω | z)_k) Change to P (lX)_t|z_k) And lX_tE.g. omega. Then P (tX)_a|z_k)＝P(tX_a|Ω)P(Ω|z_k)＝P(tX_a|lX_t)P(lX_t|z_k). Test sample tX_aThe new feature generation equation 14 may be redefined as shown in equation 15.

Equation 15

And 5, modularizing and integrating.

MIN, MAX and INV units are three integrated units in the M3 network, and they play an important role in the M3 network, eachIndependent of each otherThe sub-modules of (1) are combined by the three modules.

MIN unit: let I₁,I₂,...,I_nFor n input values, O_minIs MIN cell output value, then O_min＝MIN{I₁,I₂,...,I_i,...,I_nThe output result of the MIN unit is the minimum of all input values.

MAX unit: let I₁,I₂,...,I_nFor n input values, O_maxIs the MAX cell output value, then O_max＝MAX{I₁,I₂,...,I_i,...,I_n}. This is in contrast to what would be expected from the MIN unit, the output O of the MAX unit_maxIs the maximum of all input values.

INV unit: the inverse operation is taken. In the framework of M3, the INV unit functions like the inversion of a matrix in order to avoid the repetition of generating multiple modules with the same training samples.

M3 network utilizes MIN rule and MAX rule for eachIndependent of each otherThe sub-modules of (a) are integrated together:

min rule: for eachIndependent of each otherThe training sample set of the submodule takes the minimum value O if the submodule has the same positive training sample and different negative training samples_min；

Max rule: for eachIndependent of each otherThe training sample set of the submodule takes the maximum value O if it has the same negative training sample and different positive training samples_max。

In conjunction with the above two rules, the present invention bases the new subset of labeled samples And combining the base classifiers obtained by training by utilizing a Min-Max rule. Using only M for a class two problem_iOne MIN unit and one MAX unit. The formula is described as follows:

(1) the base classifier obtained by training based on the sample subset with the same class-one training sample firstly passes through the MIN unit and adopts Min rule integration:

O_{ij}^{p} (x) = MIN (O_{ij}^{p, 1} (x), O_{ij}^{p, 2} (x), . . ., O_{ij}^{p, M_{j}} (x)) = {MIN}_{q = 1}^{M_{j}} {O_{ij}^{p, q} (x)}

equation 16

Wherein,are each provided with the same positive type training sampleIndependent of each otherAnd the submodule outputs the minimum value after adopting a Min rule.Is a sub-problem of two typesAnd learning the classification result of the obtained base classifier for the sample x.

(2) Then, the Max rule integration is adopted by the MAX unit:

O_{ij} (x) = {MAX}_{p = 1}^{M_{i}} {O_{ij}^{p} (x)}

equation 17

Wherein, O_ij(x) Is a second kind of problem S_ijThe final integration result.

Claims

1. A semi-supervised minimum and maximum modular pattern classification method is characterized in that unlabelled samples are added into a labeled sample subset obtained in an M3 network task decomposition stage, a hidden variable of a data generation model is utilized to link the labeled samples and the unlabelled samples, the posterior probability of the hidden variable and the labeled samples is used as a new feature of the labeled samples, and then a Min-Max integration rule is utilized to obtain a solution of an original problem, and the method comprises the following steps:

step 1: dividing data;

dividing an original marked sample set according to a task decomposition principle of an M3 network; equally dividing the unmarked sample set, wherein the number of the sample subsets is the same as that of the marked sample subsets;

step 2: allocation of unlabeled sample subsets;

adding the unmarked sample subset into the marked sample subset of the second type without repetition according to the principle that the central point of the subset is farthest; at this time, each independent training sample subset comprises two parts, one part is a marked sample subset of the second type, and the other part is an unmarked sample subset;

and step 3: hidden feature generation;

for each training subset, it is assumed that the labeled samples and the unlabeled samples are generated from the same generative model, and the labeled samples and the unlabeled samples are generated by the hidden variable z_kDetermining; solving hidden variables by using a probability latent semantic analysis PLSA method, and taking the solved hidden variables and the posterior probability of the marked samples as new characteristics of the marked samples; training a classifier on the marked sample subset after the new features are added;

and 4, step 4: converting the feature space of the test sample;

if the feature space of the training sample is changed, the feature space of the test sample is mapped to the same feature space as the training sample, and the posterior probability values of the hidden variables and the test sample are estimated to be used as new features of the test sample by extracting n labeled samples of the nearest neighbor of each test sample in the training subset;

and 5: modularization integration is carried out;

and (3) taking the classifier obtained in the training stage as a test sample prediction label, and integrating prediction results of all base classifiers by using a Min-Max rule to obtain a solution of the original problem.

2. A semi-supervised min-max modular pattern classification method as claimed in claim 1, wherein: the method combines an M3 network and semi-supervised learning, and comprises the following steps:

step 1: dividing data;

the original marked sample set S_LDividing the sample into a plurality of samples according to a hyperplane division method based on a sample division principle of an M3 networkA subset of labelled samples, M_iAnd M_jRespectively represent S_LMiddle C_iClass and C_jThe number of blocks into which the class sample is divided; equally dividing the unmarked sample set, wherein the number of the sample subsets is the same as that of the marked sample subsets;

step 2: allocation of unlabeled sample subsets;

calculating the center point of each sample subset, and distributing the unmarked samples to the marked sample subsets which are farthest away from the center point of each sample subset; at this time, each independent training sample subset comprises two parts, one part is a marked sample subset of the second type, and the other part is an unmarked sample subset;

and step 3: hidden feature generation;

training the subset of labeled samples with S_labeledIndicating, unlabeled sample subset by S_unlabeledRepresents; assuming that both marked and unmarked samples are generated by a generative model, and there are some hidden variables z behind the marked and unmarked samples₁,z₂,...,z_K″For deciding on the whole sample generation process, comprising:

1) from all hidden variables with probability P (z)_k) Select z_k；

3) Given hidden variable z_kWith conditional probability P (uX)_r|z_k) Generation of unlabeled sample uX_r；

Marked sample lX_tIs generated only with the hidden variable z_k(ii) related, independent of unlabeled sample; likewise, unlabeled sample uX_rIs also generated only with the hidden variable z_kRelated, independent of marked samples(ii) a Marked sample lX generated by the model_tWith unlabeled sample uX_rAre independent of each other; according to the condition independent property in the probability theory, the following equation holds:

P(lX_t,uX_r|z_k)＝P(lX_t|z_k)P(uX_t|z_k) Equation 1

P(lX_t,uX_r)＝P(lX_t)P(uX_r|lX_t) Equation 2

Equation 3

equation 4

lx_t,dA d-th feature representing a t-th labeled sample; ux_r,dRepresents the d-th feature of the r-th unlabeled sample;

designing a likelihood function according to the established data model by utilizing a probability latent semantic analysis PLSA method, and obtaining a maximum likelihood estimation value by an expectation maximization EM method;

according to the idea of the PLSA method, a log likelihood function is established, and then the function is gradually optimized to obtain the optimal P (z)_k)、P(lX_t|z_k) And P (uX)_r|z_k) The method comprises the following steps:

(1) establishing a likelihood function log P (S)_labeled,S_unlabeled,Z)；

According to Bayes' formula, conditional probability

P (z_{k} | l X_{t}) = \frac{P (l X_{t} | z_{k}) P (z_{k})}{P (l X_{t})}

Equation 5

equation 6

equation 7

Thus, a likelihood function formula on all samples is obtained:

<math> <mrow> <mi>f</mi> <mo>=</mo> <mi>log</mi> <mi> P</mi> <mrow> <mo>(</mo> <msub> <mi>S</mi> <mi>labeled</mi> </msub> <mo>,</mo> <msub> <mi>S</mi> <mi>unlabeled</mi> </msub> <mo>,</mo> <mi>Z</mi> <mo>)</mo> </mrow> <mo>=</mo> <msubsup> <mi>Σ</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <msup> <mi>K</mi> <mrow> <mo>′</mo> <mo>′</mo> </mrow> </msup> </msubsup> <msubsup> <mi>Σ</mi> <mrow> <mi>t</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mi>w</mi> <mo>+</mo> <mi>g</mi> </mrow> </msubsup> <msubsup> <mi>Σ</mi> <mrow> <mi>r</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>v</mi> </msubsup> <msub> <mi>θ</mi> <mi>tr</mi> </msub> <mi>P</mi> <mrow> <mo>(</mo> <mi>l</mi> <msub> <mi>X</mi> <mi>t</mi> </msub> <mo>,</mo> <mi>u</mi> <msub> <mi>X</mi> <mi>r</mi> </msub> <mo>,</mo> <msub> <mi>z</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> </mrow> </math>

equation 8

Wherein, theta_trIs an unlabeled sample uX_iAnd labeled sample lX_tThe distance of (d); k' is the number of hidden variables; w + g is the number of labeled samples contained in each training sample subset; ν is the number of unlabeled samples contained in each subset of training samples;

(2) maximizing log likelihood function

Maximizing the log likelihood function is iterative optimization of function equation 8 using the EM method, and E is the calculation of P (z) using labeled and unlabeled samples_k|lX_t,uX_r) (ii) a M step is carried out using the resulting P (z)_k|lX_t,uX_r) Value calculation conditional probability P (lX)_t|z_k) And P (uX)_r|z_k) Obtaining a maximum likelihood value;

equation 9

Equation 10

Equation 11

Equation 12

the obtained optimal solution can be used to obtain a hidden feature value which can be regarded as a marked sample IX_tNew characteristic of (2), lX_tIs defined as shown in formula 13, which is described as a labeled sample lX_tAnd a hidden variable z_kA posterior probability of (d);

equation 13

A new marked sampleConsists of eigenvalues in the original eigenspace and newly generated eigenvalues, expressed as:

a new subset of labeled samples can then be obtained

<math> <mrow> <msubsup> <mover> <mi>S</mi> <mo>~</mo> </mover> <mi>labeled</mi> <mi>h</mi> </msubsup> <mo>=</mo> <msubsup> <mrow> <mo>{</mo> <mi>l</mi> <msub> <mover> <mi>X</mi> <mo>~</mo> </mover> <mi>t</mi> </msub> <mo>}</mo> </mrow> <mrow> <mi>t</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mi>w</mi> <mo>+</mo> <mi>g</mi> </mrow> </msubsup> <mo>,</mo> <mi>h</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>,</mo> <munderover> <mi>Σ</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mi>K</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <munderover> <mi>Σ</mi> <mrow> <mi>j</mi> <mo>=</mo> <mi>i</mi> <mo>+</mo> <mn>1</mn> </mrow> <mi>K</mi> </munderover> <msub> <mi>M</mi> <mi>i</mi> </msub> <mo>×</mo> <msub> <mi>M</mi> <mi>j</mi> </msub> <mo>;</mo> </mrow> </math>

And 4, step 4: converting the feature space of the test sample;

because the feature space of the training sample is changed, the feature space of the test sample should be mapped to the same feature space of the training sample, and the test sample tX is still used in the test stage_aAnd a hidden variable z_kAs test samplesNew features, as shown in equation 16;

equation 14

In the marked sample set S_labeledN neighbor samples obtained in (1), hidden variable z_kDirectly from the training phase; test sample tX_aAnd a hidden variable z_kIndependent of each other, then P (tX)_a|z_k) Can be split into two parts: p (tX)_a| Ω) and P (Ω | z)_k) (ii) a Using the nearest neighbor matrix omega and the test sample tX_aIs expressed as the Euclidean distance of P (tX)_aΩ) because Ω belongs to the marked sample subset S_labeledThen P (tX) may be added_a| Ω) to P (tX)_a|lX_t) Let P (Ω | z)_k) Change to P (lX)_t|z_k) And lX_tE.g. omega, test sample tX_aThe new feature of (a) yields equation 14 to redefine as shown in equation 15;

equation 15

And 5: modularization integration is carried out;

3. A semi-supervised min-max modular pattern classification method as claimed in claim 1, wherein: the method integrates marked samples and unmarked samples according to generative semi-supervised learning, constructs new features by utilizing hidden characteristics between the marked samples and the unmarked samples, obtains a new marked sample set, and trains a classifier on the new marked sample set.

4. A semi-supervised min-max modular pattern classification method as claimed in claim 1, wherein: the method expands the application range of the minimum and maximum modular network.

5. A semi-supervised min-max modular pattern classification method as claimed in claim 1, wherein: the method is to train a classifier by using the labeled samples with the added features.

6. A semi-supervised min-max modular pattern classification method as claimed in claim 1 applied to min-max modular networks.