CN103324610A

CN103324610A - Sample training method and device for mobile device

Info

Publication number: CN103324610A
Application number: CN2013102308120A
Authority: CN
Inventors: 李寿山; 高伟; 周国栋
Original assignee: Suzhou University
Current assignee: Suzhou University
Priority date: 2013-06-09
Filing date: 2013-06-09
Publication date: 2013-09-25

Abstract

The invention discloses a sample training method and device for a mobile device. The method is used in the device, the device is used in the mobile device, all characteristic values in preset samples are extracted, all characteristic values are decomposed according to preset rules to obtain at least one characteristic value subspace, training of machine learning classification methods is performed on the at least one characteristic value subspace, and base classifiers correspond to the at least one characteristic value subspace are obtained. According to the method, all extracted characteristic values in the preset samples are decomposed to obtain the at least one characteristic value subspace, the training of the machine learning classification methods is performed on the at least one characteristic value subspace to obtain the base classifiers corresponding to the at least one characteristic value subspace, and all base classifiers are obtained through the training of the at least one characteristic value subspace, so that the number of the characteristic values is obviously smaller than that of all characteristic values, and the memory space required during sample training is also small.

Description

A kind of sample training method and device that is applied to mobile device

Technical field

The present invention relates to field of information processing, particularly a kind of sample training method and device that is applied to mobile device.

Background technology

Along with the fast development of internet, people more and more get used to the viewpoint in network expression oneself, thereby make the text that emerges a large amount of band emotions on the network.These tendentiousness texts often exist with the form of comment on commodity, forum's comment and blog.These texts are crucial text often, or the user's interest text.

So-called text based on sentiment classification is analyzed speaker's attitude (or claiming viewpoint, emotion) exactly, just the subjectivity information in the text is analyzed.Emotion classification (Sentiment Classification) is a basic task during emotion is analyzed.This task is intended to text is passed judgement on classification according to the emotion tendency.Compare based on the text classification of theme with tradition, the emotion classification is considered to have more challenge.This task specifically refers to text is divided into the task of front text or negative text.For example: " I am delithted with this film ", by the emotion classification, the words will be divided into the front text, and " the very poor strength of this film " is classified as negative text.

At present, utilize the training process in the supervised classification method of machine learning, often need manually to mark the positive negative sample of certain scale.The classification accuracy of this method is than higher, but along with the number of training purpose increases, number of features also improves thereupon significantly, needs to take a large amount of memory headrooms in the assorting process, often be subjected to the restriction of memory size, the task of being difficult to carry out text classification for mobile terminal device.

Summary of the invention

The invention provides a kind of sample training method and device that is applied to mobile device, little owing to the mobile device internal memory to solve in the prior art, and the problem that can't carry out text classification.

Concrete technical scheme is as follows:

A kind of sample training method that is applied to mobile device, described method comprises:

Extract the whole eigenwerts in the default sample;

According to preset rules described whole eigenwerts are decomposed, obtain at least one eigenwert subspace;

The training of machine learning classification method is carried out in each described eigenwert subspace, obtain the basic sorter of each described eigenwert subspace correspondence.

Preferably, also comprise:

Utilize described basic sorter to treat classification samples and classify, obtain the classification results of each described basic sorter correspondence, wherein, described classification results can be first classification results and derogatory sense character second classification results of commendation character;

Utilize fusion rule to merge respectively first classification results and described derogatory sense character second classification results of described commendation character, obtain commendation character first fusion results and derogatory sense character second fusion results;

Whether judge described commendation character first fusion results greater than described derogatory sense character second fusion results, if, the result that then to obtain described sample to be sorted be commendation, if not, the result that then to obtain described sample to be sorted be derogatory sense.

Preferably, the process of the whole eigenwerts in the default sample of described extraction comprises:

Use characteristic value extracting method extracts the whole eigenwerts in the default sample.

Preferably, describedly according to preset rules described whole eigenwerts are decomposed, the process that obtains at least one eigenwert subspace comprises:

Adopt the mode of average division or random extraction that described whole eigenwerts are decomposed, obtain at least one eigenwert subspace.

Preferably, described the training of machine learning classification method is carried out in each described eigenwert subspace, the process that obtains the basic sorter of each described eigenwert subspace correspondence comprises:

Use identical machine learning classification method or different machine learning classification methods to train to each described eigenwert subspace, obtain the basic sorter of each described eigenwert subspace correspondence.

Preferably, describedly utilize fusion rule to merge respectively described commendation character first classification results and described derogatory sense character second classification results, the process that obtains commendation character first fusion results and derogatory sense character second fusion results comprises:

Utilize Bayes's fusion rule to merge respectively described commendation character first classification results and described derogatory sense character second classification results, obtain commendation character first fusion results and derogatory sense character second fusion results.

A kind of sample training device that is applied to mobile device, described device comprises: extraction module, decomposing module and training module;

Wherein, described extraction module is used for, and extracts the whole eigenwerts in the default sample;

Described decomposing module is used for, and according to preset rules described whole eigenwerts is decomposed, and obtains at least one eigenwert subspace;

Described training module is used for, and the training of machine learning classification method is carried out in each described eigenwert subspace, obtains the basic sorter of each described eigenwert subspace correspondence.

Preferably, also comprise: sort module, Fusion Module and judge module;

Wherein, described sort module is used for, and utilizes described basic sorter to treat classification samples and classifies, and obtains the classification results of each described basic sorter correspondence, wherein, described classification results can be commendation character first classification results and derogatory sense character second classification results;

Described Fusion Module is used for, and utilizes fusion rule to merge respectively described commendation character first classification results and described derogatory sense character second classification results, obtains commendation character first fusion results and derogatory sense character second fusion results;

Described judge module is used for, and whether judges described commendation character first fusion results greater than described derogatory sense character second fusion results, if, the result that then to obtain described sample to be sorted be commendation, if not, the result that then to obtain described sample to be sorted be derogatory sense.

As can be seen from the above technical solutions, the invention provides a kind of sample training method and device that is applied to mobile device, described method is applied in the described device, described device is applied in the mobile device, extract the whole eigenwerts in the default sample, according to preset rules described whole eigenwerts are decomposed, obtain at least one eigenwert subspace, the training of machine learning classification method is carried out in each described eigenwert subspace, obtain the basic sorter of each described eigenwert subspace correspondence, in the described method, whole eigenwerts in the default sample that extracts are decomposed, obtain at least one eigenwert subspace, carry out the training of machine learning classification method for described each eigenwert subspace, therefore obtain the basic sorter of each described eigenwert subspace correspondence, because each described basic sorter is obtained by the training of described eigenwert subspace, the eigenwert number is obviously little much than whole eigenwerts so, therefore, the memory headroom that needs in sample training is also little a lot.

Description of drawings

In order to be illustrated more clearly in the embodiment of the invention or technical scheme of the prior art, to do to introduce simply to the accompanying drawing of required use in invention or the description of the Prior Art below, apparently, the accompanying drawing that describes below only is some embodiment that put down in writing among the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain other accompanying drawing according to these accompanying drawings.

Fig. 1 is the embodiment of the invention one disclosed a kind of sample training method flow synoptic diagram that is applied to mobile device;

Fig. 2 is the embodiment of the invention two disclosed a kind of sample training method flow synoptic diagram that are applied to mobile device;

Fig. 3 is the embodiment of the invention three disclosed a kind of sample training apparatus structure synoptic diagram that are applied to mobile device;

Fig. 4 is the embodiment of the invention four disclosed a kind of sample training apparatus structure synoptic diagram that are applied to mobile device.

Embodiment

Fast development along with the internet, people more and more hate in the happiness of network expression oneself, thereby emerge the text of a large amount of band emotions on the network, in order to distinguish the emotion classification in these texts, method commonly used at present is: the supervised classification method that utilizes machine learning, training process in the method often needs manually to mark the positive negative sample of certain scale.The classification accuracy of this method is than higher, but along with the number of training purpose increases, number of features also improves thereupon significantly, needs to take a large amount of memory headrooms in the assorting process, often be subjected to the restriction of memory size, the task of being difficult to carry out text classification for mobile terminal device.

Therefore, the present invention proposes a kind of sample training method and device that is applied to mobile device, described method is applied in the described device, whole eigenwerts in the default sample that extracts are decomposed, obtain at least one eigenwert subspace, carry out the training of machine learning classification method for described each eigenwert subspace, therefore obtain the basic sorter of each described eigenwert subspace correspondence, because each described basic sorter is obtained by the training of described eigenwert subspace, the eigenwert number is obviously little much than whole eigenwerts so, therefore, the memory headroom that needs in sample training is also little a lot.

Below in conjunction with the accompanying drawing in the embodiment of the invention, the technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to protection scope of the present invention not making the every other embodiment that obtains under the creative work prerequisite.

The embodiment of the invention one discloses a kind of sample training method that is applied to mobile device, and referring to shown in Figure 1, described method comprises:

Step S101: extract the whole eigenwerts in the default sample;

Wherein, use characteristic value extracting method extracts the whole eigenwerts in the default sample.

Step S102: according to preset rules described whole eigenwerts are decomposed, obtain at least one eigenwert subspace;

Wherein, adopt the mode of average division or random extraction that described whole eigenwerts are decomposed, obtain at least one eigenwert subspace.

Suppose that default sample is X=(X ₁, X ₂..., X _n), X wherein _iBe m dimensional vector X _i=(x _I1, x _I2... x _Im).Specifically, (r＜m) make up r dimension stochastic subspace can make up by r dimension sample based on this mode this method by select r feature at random in original m dimensional feature space

(i=1,2 ... the new training set of n) forming

Step S103: the training of machine learning classification method is carried out in each described eigenwert subspace, obtain the basic sorter of each described eigenwert subspace correspondence;

Wherein, use identical machine learning classification method or different machine learning classification methods to train to each described eigenwert subspace, obtain the basic sorter of each described eigenwert subspace correspondence, in addition, it is also conceivable that and utilize different machine learning methods, make the otherness of basic sorter increase, be conducive to improve the classifying quality of system;

The machine learning classification method of using in the present embodiment is the maximum entropy sorting technique:

Wherein, the maximum entropy sorting technique is based on the maximum entropy information theory, and its basic thought is to set up model for all known factors, and all unknown factors are foreclosed.That is to say, find a kind of probability distribution, satisfy all known facts, but allow the randomization of unknown factor.With respect to the naive Bayesian method, it is independent that the characteristics of this method maximum are exactly the condition that does not need to satisfy between feature and the feature.Therefore, this method be fit to merge various different features, and need not to consider the influence between them;

Under maximum entropy model, the formula of predicted condition probability P (c|D) is as follows:

P (c_{i} | D) = \frac{1}{Z (D)} \exp (\underset{k}{Σ} λ_{k, c} F_{k, c} (D, c_{i}))

Wherein Z (D) is normalized factor.F _{K, c}Be fundamental function, be defined as:

F_{k, c} (D, c^{'}) = {\begin{matrix} 1, & n_{k} (d) > 0 and c^{'} = c \\ 0, & otherwise \end{matrix};

Except the maximum entropy sorting technique, common machine learning classification method has naive Bayesian and support vector machine etc.

Present embodiment discloses a kind of sample training method that is applied to mobile device, in the described method, extract the whole eigenwerts in the default sample, according to preset rules described whole eigenwerts are decomposed, obtain at least one eigenwert subspace, the training of machine learning classification method is carried out in each described eigenwert subspace, obtain the basic sorter of each described eigenwert subspace correspondence, in the described method, whole eigenwerts in the default sample that extracts are decomposed, obtain at least one eigenwert subspace, carry out the training of machine learning classification method for described each eigenwert subspace, therefore obtain the basic sorter of each described eigenwert subspace correspondence, because each described basic sorter is obtained by the training of described eigenwert subspace, the eigenwert number is obviously little much than whole eigenwerts so, and therefore, the memory headroom that needs in sample training is also little a lot.

The embodiment of the invention two discloses a kind of sample training method that is applied to mobile device, and referring to shown in Figure 2, described method comprises:

Step S201: extract the whole eigenwerts in the default sample;

Step S202: according to preset rules described whole eigenwerts are decomposed, obtain at least one eigenwert subspace;

Step S203: the training of machine learning classification method is carried out in each described eigenwert subspace, obtain the basic sorter of each described eigenwert subspace correspondence;

Wherein, the specific implementation of step S201-step S203 is the same with the specific implementation of embodiment one disclosed step S101-step S103, just repeats no more herein.

Step S204: utilize described basic sorter to treat classification samples and classify, obtain the classification results of each described basic sorter correspondence, wherein, described classification results can be commendation character first classification results and derogatory sense character second classification results;

Need to prove that the corresponding classification results of each described basic sorter has comprised commendation result's probability and derogatory sense result's probability in the described classification results;

Step S205: utilize fusion rule to merge respectively described commendation character first classification results and described derogatory sense character second classification results, obtain commendation character first fusion results and derogatory sense character second fusion results;

Wherein, utilize Bayes's fusion rule to merge respectively described commendation character first classification results and described derogatory sense character second classification results, obtain commendation character first fusion results and derogatory sense character second fusion results.

Because corresponding classification results of each described basic sorter, commendation result's probability and derogatory sense result's probability have been comprised in the described classification results, be commendation character first classification results and derogatory sense character second classification results, utilize Bayes's fusion rule to merge the classification results of described commendation, obtain commendation character first fusion results, also utilize Bayes's fusion rule to merge the classification results of described derogatory sense, obtain derogatory sense character second fusion results;

Suppose P _l(c ₊| D) and P _l(c _-| D) represent the result that l basic sorter provides respectively,

Bayes's fusion rule refers to specifically suppose that the result that each sorter provides is separate, and like this, sample belongs to the posterior probability P of commendation _l(c ₊| D) and sample belong to the posterior probability P of derogatory sense _l(c _-| D) can be expressed as by Bayesian formula:

P (c_{+} | D) = P (c_{+}) Π_{l = 1}^{N} P_{l} (c_{+} | D)

P (c_{-} | D) = P (c_{-}) Π_{l = 1}^{N} P_{l} (c_{-} | D)

Wherein, P (c ₊) and P (c _-) represent that respectively sample belongs to the prior probability of commendation and derogatory sense.Ignore the influence of prior probability among the present invention, all be set to 0.5.

Step S206: whether judge described commendation character first fusion results greater than described derogatory sense character second fusion results, if, execution in step S207 then, if not, execution in step S208 then;

Wherein, the text tendentiousness kind judging of sample to be sorted is by posterior probability P _l(c ₊| D) and P _l(c _-| D) decide, concrete decision rule is as follows:

If P is (c ₊| D)〉P (c _-| D), then described sample to be sorted belongs to commendation, otherwise described sample to be sorted belongs to derogatory sense.

Step S207: the result that then to obtain described sample to be sorted be commendation;

Step S208: the result that then to obtain described sample to be sorted be derogatory sense.

Present embodiment discloses a kind of sample training method that is applied to mobile device, described method is on the basis of embodiment one, increased the method for the emotion classification that utilizes described basic sorter to treat classification samples, namely, utilizing described basic sorter to treat classification samples classifies, obtain the classification results of each described basic sorter correspondence, wherein, described classification results can be commendation character first classification results and derogatory sense character second classification results, utilize fusion rule to merge respectively described commendation character first classification results and described derogatory sense character second classification results, obtain commendation character first fusion results and derogatory sense character second fusion results, judge that whether described commendation character first fusion results is greater than described derogatory sense character second fusion results, if, the result that then to obtain described sample to be sorted be commendation, if not, the result that then to obtain described sample to be sorted be derogatory sense, because present embodiment has used described basic sorter to treat the emotion classification of classification samples, and each described basic sorter is obtained by the training of described eigenwert subspace, the eigenwert number is obviously little much than whole eigenwerts so, therefore, the memory headroom that needs in sample training is also little a lot, thereby is applicable in the mobile device.

The embodiment of the invention discloses a kind of sample training device that is applied to mobile device for three kinds, and referring to shown in Figure 3, described device comprises: extraction module 101, decomposing module 102 and training module 103;

Wherein, described extraction module 101 is used for, and extracts the whole eigenwerts in the default sample;

Described extraction module 101 can extract whole eigenwerts of presetting in the sample by use characteristic value extracting method.

Described decomposing module 102 is used for, and according to preset rules described whole eigenwerts is decomposed, and obtains at least one eigenwert subspace;

Described decomposing module can adopt the mode of average division or random extraction that described whole eigenwerts are decomposed, and obtains at least one eigenwert subspace.

Suppose that default sample is X=(X ₁, X ₂..., X _n), X wherein _iBe m dimensional vector X _i=(x _I1, x _I2... x _Im).Specifically, (r＜m) make up r dimension stochastic subspace can make up by r dimension sample based on this mode this method by select r feature at random in original m dimensional feature space (i=1,2 ... the new training set of n) forming

Described training module 103 is used for, and the training of machine learning classification method is carried out in each described eigenwert subspace, obtains the basic sorter of each described eigenwert subspace correspondence.

Described training module can use identical machine learning classification method or different machine learning classification methods to train to each described eigenwert subspace, obtain the basic sorter of each described eigenwert subspace correspondence, in addition, it is also conceivable that and utilize different machine learning methods, make the otherness of basic sorter increase, be conducive to improve the classifying quality of system;

P (c_{i} | D) = \frac{1}{Z (D)} \exp (\underset{k}{Σ} λ_{k, c} F_{k, c} (D, c_{i}))

F_{k, c} (D, c^{'}) = {\begin{matrix} 1, & n_{k} (d) > 0 and c^{'} = c \\ 0, & otherwise \end{matrix};

Present embodiment discloses a kind of sample training device that is applied to mobile device, described device extracts the whole eigenwerts in the default sample, according to preset rules described whole eigenwerts are decomposed, obtain at least one eigenwert subspace, the training of machine learning classification method is carried out in each described eigenwert subspace, obtain the basic sorter of each described eigenwert subspace correspondence, in the described method, whole eigenwerts in the default sample that extracts are decomposed, obtain at least one eigenwert subspace, carry out the training of machine learning classification method for described each eigenwert subspace, therefore obtain the basic sorter of each described eigenwert subspace correspondence, because each described basic sorter is obtained by the training of described eigenwert subspace, the eigenwert number is obviously little much than whole eigenwerts so, therefore, the memory headroom that needs in sample training is also little a lot.

The embodiment of the invention four discloses a kind of sample training device that is applied to mobile device, and referring to shown in Figure 4, described device comprises: extraction module 101, decomposing module 102, training module 103, sort module 104, Fusion Module 105 and judge module 106;

Described extraction module 101, described decomposing module 102, described training module 103 and to implement three disclosed described extraction modules 101, described decomposing module 102, described training module 103 consistent;

Wherein, described sort module 104 is used for, and utilizes described basic sorter to treat classification samples and classifies, and obtains the classification results of each described basic sorter correspondence, wherein, described classification results can be commendation character first classification results and derogatory sense character second classification results;

Described Fusion Module 105 is used for, and utilizes fusion rule to merge respectively described commendation character first classification results and described derogatory sense character second classification results, obtains commendation character first fusion results and derogatory sense character second fusion results;

P (c_{+} | D) = P (c_{+}) Π_{l = 1}^{N} P_{l} (c_{+} | D)

P (c_{-} | D) = P (c_{-}) Π_{l = 1}^{N} P_{l} (c_{-} | D)

Described judge module 106 is used for, whether judge described commendation character first fusion results greater than described derogatory sense character second fusion results, if, the result that then to obtain described sample to be sorted be commendation, if not, the result that then to obtain described sample to be sorted be derogatory sense;

Present embodiment discloses a kind of sample training device that is applied to mobile device, used described basic sorter to treat the emotion classification of classification samples, and each described basic sorter is obtained by the training of described eigenwert subspace, the eigenwert number is obviously little much than whole eigenwerts so, therefore, the memory headroom that needs in sample training is also little a lot, thereby is applicable in the mobile device.

Each embodiment adopts the mode of going forward one by one to describe in this instructions, and what each embodiment stressed is and the difference of other embodiment that identical similar part is mutually referring to getting final product between each embodiment.For the disclosed device of embodiment, because it is corresponding with the embodiment disclosed method, so description is fairly simple, relevant part partly illustrates referring to method and gets final product.

To the above-mentioned explanation of the disclosed embodiments, make this area professional and technical personnel can realize or use the present invention.Multiple modification to these embodiment is apparent to those skilled in the art, and defined General Principle can realize under the situation that does not break away from the spirit or scope of the present invention in other embodiments herein.Therefore, the present invention will can not be restricted to these embodiment shown in this article, but will meet the wide region consistent with principle disclosed herein and features of novelty.

Claims

1. sample training method that is applied to mobile device is characterized in that described method comprises:

Extract the whole eigenwerts in the default sample;

2. method according to claim 1 is characterized in that, also comprises:

3. method according to claim 1 is characterized in that, the process of the whole eigenwerts in the default sample of described extraction comprises:

4. method according to claim 1 is characterized in that, describedly according to preset rules described whole eigenwerts is decomposed, and the process that obtains at least one eigenwert subspace comprises:

5. method according to claim 1 is characterized in that, described the training of machine learning classification method is carried out in each described eigenwert subspace, and the process that obtains the basic sorter of each described eigenwert subspace correspondence comprises:

6. method according to claim 2, it is characterized in that, describedly utilize fusion rule to merge respectively described commendation character first classification results and described derogatory sense character second classification results, the process that obtains commendation character first fusion results and derogatory sense character second fusion results comprises:

7. a sample training device that is applied to mobile device is characterized in that described device comprises: extraction module, decomposing module and training module;

8. device according to claim 7 is characterized in that, also comprises: sort module, Fusion Module and judge module;