CN102402713A

CN102402713A - Robot learning method and device

Info

Publication number: CN102402713A
Application number: CN2010102802390A
Authority: CN
Inventors: 杨宇航; 于浩; 孟遥; 陆应亮; 夏迎炬
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2010-09-09
Filing date: 2010-09-09
Publication date: 2012-04-04
Anticipated expiration: 2030-09-09
Also published as: CN102402713B

Abstract

The invention discloses a robot learning method and a corresponding device. The robot learning method comprises the following steps of: using different methods to automatically label and obtain n different seed sets S1 to Sn from unlabeled data set, and n is natural number of not less than 2; using the n automatically labeled seed sets S1 to Sn to train n corresponding sorters C1 to Cn respectively; for each seed set Si in the n automatically labeled seed sets, i is 1 to n, and using part or all of sorters expect for the sorter Ci trained by the seed set Si to check the seed set Si; and using the n checked seed sets S1 to Sn to retrain the n corresponding sorters C1 to Cn respectively.

Description

Machine learning method and device

Technical field

The present invention relates to the machine learning field, more specifically, relate to a kind of fault-tolerant machine learning method and device.

Background technology

Machine learning is intended to study computing machine and how simulates or realize human learning behavior, to obtain new knowledge or skills, reorganizes the existing structure of knowledge and makes it constantly to improve the performance of self.Machine learning method and device are widely used in the task of different field, for example computer vision, natural language processing, bioinformatics etc.

Machine learning can be divided into supervise learning and not have two big types of the study of guidance.Generally speaking, guideless learning method is used the not data set training classifier of mark.Fig. 1 shows the indicative flowchart that a kind of nothing of the prior art instructs machine learning method.In step S110, the data set that does not mark is carried out random labelling, obtain training set.In step S120, use the training set training classifier.In step S130, with the pending example collection of sorter prediction that trains.Guideless learning method need not to drop into a large amount of manpowers data set is marked, but since data set without mark, effect possibly not be very desirable.

Fig. 2 shows a kind of indicative flowchart that the guidance machine learning method is arranged of the prior art.In step S210, with the training set training classifier of artificial mark.In step S220, with the pending example collection of sorter prediction that trains.There is the directed learning method to use the data of a large amount of artificial check and correction, thereby can obtains effect preferably.But such method is difficult to be transplanted to the field or the application of resource-constrained.

Therefore machine learning method often faces such awkward situation: guideless method is maybe effect not very good, is used to prepare corpus and there is the method for guidance need consume lot of manpower and material resources.

In order to overcome this awkward situation, half directed learning method has appearred.Fig. 3 shows the indicative flowchart of a kind of half guidance machine learning method of the prior art.Instruct learning method to compare with the nothing of Fig. 1, when training classifier, except using data centralization random labelling that never marks and the training set that obtains, also used the training set of artificial mark among Fig. 3.Fig. 4 shows the indicative flowchart of another kind half guidance machine learning method of the prior art.In the method for Fig. 4, manual work marks and obtains a seed set in step S410, and in step S420, trains a sorter with this seed set.In addition, in order to improve the performance of sorter, in step S430, with the pending example collection of sorter prediction; In step S440, the instance that confidence level in predicting the outcome is the highest adds in the seed set; And in step S450, utilize the seed that adds instance to gather training classifier once more.Repeating step S430 to S450 is up to the repetition end condition that satisfies regulation.

The method of half guidance can be used mark and the language material that does not mark simultaneously, but still depends critically upon the scale and the quality of mark language material.Still be the significant challenge that the machine learning field faces how at artificial degree of participation and aspect of performance seeking balance.

Summary of the invention

Provided hereinafter about brief overview of the present invention, so that the basic comprehension about some aspect of the present invention is provided.Should be appreciated that this general introduction is not about exhaustive general introduction of the present invention.It is not that intention is confirmed key of the present invention or pith, neither be intended to limit scope of the present invention.Its purpose only is to provide some notion with the form of simplifying, with this as the preorder in greater detail of argumentation after a while.

In view of the above situation of prior art, the present invention aim to provide a kind of efficiently, fault-tolerant machine learning method and device.

According to an aspect of the present invention, a kind of machine learning method comprises: the data centralization of utilizing diverse ways never to mark marks and obtains n different seed S set automatically ₁, S ₂..., S _n, n is natural number and n>=2; Utilize said n the seed S set of mark automatically ₁, S ₂..., S _nTrain corresponding n sorter C respectively ₁, C ₂..., C _nAutomatically, each the seed S set during the seed of mark is gathered for said n _i, i=1,2 ..., n utilizes removing by this seed S set in the said n sorter _iThe sorter C of training _iOutside part or all of sorter to this seed S set _iVerify; And said n seed S set utilizing empirical tests ₁, S ₂..., S _nTrain corresponding n sorter C respectively once more ₁, C ₂..., C _n

According to a further aspect in the invention, a kind of machine learning device comprises: initialization unit, configuration is used for: the data centralization of utilizing diverse ways never to mark marks automatically and obtains n different seed S set ₁, S ₂..., S _n, n is natural number and n>=2; Utilize said n the seed S set of mark automatically ₁, S ₂..., S _nTrain corresponding n sorter C respectively ₁, C ₂..., C _nAnd for each the seed S set in said n the seed set that marks automatically _i, i=1,2 ..., n utilizes removing by this seed S set in the said n sorter _iThe sorter C of training _iOutside part or all of sorter to this seed S set _iVerify; And optimization and processing unit, configuration is used for: said n seed S set utilizing empirical tests ₁, S ₂..., S _nTrain corresponding n sorter C respectively once more ₁, C ₂..., C _n

In said method and device, through distinct methods the data set that does not mark is marked automatically, need not artificial the participation, improved learning efficiency.In addition, carry out cross validation through seed being gathered, and utilize through the seed set of cross validation and train the respective classified device once more, controlled the noise of introducing by automatic mark effectively, realized fault-tolerant study with sorter.

Through below in conjunction with the detailed description of accompanying drawing to most preferred embodiment of the present invention, these and other advantage of the present invention will be more obvious.

Description of drawings

With reference to below in conjunction with the explanation of accompanying drawing, can understand above and other purpose, characteristics and advantage of the present invention to the embodiment of the invention with being more prone to.Parts in the accompanying drawing are just in order to illustrate principle of the present invention.In the accompanying drawings, identical or similar techniques characteristic or parts will adopt identical or similar Reference numeral to represent.

Fig. 1 shows the indicative flowchart that a kind of nothing of the prior art instructs machine learning method.

Fig. 2 shows a kind of indicative flowchart that the guidance machine learning method is arranged of the prior art.

Fig. 3 shows the indicative flowchart of a kind of half guidance machine learning method of the prior art.

Fig. 4 shows the indicative flowchart of another kind half guidance machine learning method of the prior art.

Fig. 5 shows the indicative flowchart according to the machine learning method of the embodiment of the invention.

Fig. 6 shows the indicative flowchart according to the machine learning method of two sorters of use of the embodiment of the invention.

Fig. 7 shows the indicative flowchart according to the machine learning method of three sorters of use of the embodiment of the invention.

Fig. 8 shows the schematic block diagram according to the machine learning device of the embodiment of the invention.

Fig. 9 shows the schematic block diagram that can be used for implementing according to the computing machine of the method and apparatus of the embodiment of the invention.

Embodiment

Embodiments of the invention are described below with reference to accompanying drawings.Element of in an accompanying drawing of the present invention or a kind of embodiment, describing and characteristic can combine with element and the characteristic shown in one or more other accompanying drawing or the embodiment.Should be noted that for purpose clearly, omitted the parts that have nothing to do with the present invention, those of ordinary skills are known and the expression and the description of processing in accompanying drawing and the explanation.

In view of the challenge that has artificial degree of participation and aspect of performance seeking balance in the prior art, the method that the present inventor has proposed a kind of fault-tolerant study (Fault-Tolerant Learning) overcomes this problem.

Fault-tolerant notion proposes in Computer Architecture the earliest; Finger is when data, file corruption having occurred or having lost for various reasons in system; System can return to the former state that has an accident with these corrupted or lost files and data automatically, a kind of technology that system can normally be moved continuously.

In fault-tolerant learning method and device according to the embodiment of the invention; Learn through the corpus of automatic mark rather than the language material or the priori of artificial mark; Be a kind of machine learning method fully automatically, therefore be applied to easily in any specific area or the task.In addition, said method and apparatus is through training different sorters and be respectively applied for checking and further prediction being fault-tolerant to carry out, the raising of guaranteed performance.

Below in conjunction with Fig. 5-8 machine learning method and device according to the embodiment of the invention are described.

Fig. 5 shows the indicative flowchart according to the machine learning method of the embodiment of the invention.As shown in the figure, in step S510, the data centralization of utilizing diverse ways never to mark marks automatically and obtains a plurality of different seeds set.Here, can use various automated process to come the labeled data collection.Those skilled in the art can select suitable automated process based on application scenarios.For example; Under the application scenarios of terminology extraction; Can use G.Salton and M.J.McGill in 1983 the terminology extraction methods that in Introduction to Modern Information Retrieval.McGraw-Hill, propose based on TF-IDF; Perhaps Yuhang Yang, Qin Lu and Tiejun Zhao in 2008 at Chinese Term Extraction Using Minimal Resources.Proceedings of the 22th International Conference on Computational Linguistics; The terminology extraction method based on deictic words that proposes in the 1033-1040 page or leaf is come the labeled data collection, comprises in the seed set that obtains and utilizes this automated process to judge term and the non-term that obtains.

Then, in step S520, utilize the seed set of mark automatically to train a plurality of different sorters respectively.Sorter of each seed set training.

Then, in step S530, utilize a plurality of sorters that train to different seed set carrying out cross validations, to obtain the seed set of empirical tests.That is to say,, use other seed set to train a part or all classification device in the sorter that obtains to verify this seed set for the incompatible theory of subset.

In step S540, utilize a plurality of seed set to train the respective classified device once more.That is to say, utilize the seed set of this renewal to train once more by this seed set trained listening group.

Next, can handle pending example collection with the sorter of training once more.This can carry out with reference to the method for prior art, and is not shown here.

Preferably, in order further to improve performance, can also in the processing of example collection, introduce cross validation.Particularly, among the step S550, utilize the sorter of training once more that pending example collection is predicted.In step S560, utilize sorter that the example collection of prediction is carried out cross validation.S530 is similar with step, for an example collection through prediction, can use the part of other sorters except the sorter that is used for this example collection is predicted or all classification device to come this example collection is verified.Then, in step S570, the instance in the example collection of empirical tests is added in the corresponding seed set, so that train the corresponding sorter of branch once more with the seed set of this renewal.That is to say, the instance in the example collection of empirical tests is joined the seed set that is used for training the sorter that is used to verify this example collection.Here, as an example, can the instance of the some that confidence level is the highest in the example collection of empirical tests be joined in the seed set.Repeated execution of steps S540 to S570 is up to satisfying repetition end condition (hereinafter also write and do stopping criterion for iteration).Here, end condition can be set as required.As an example, can set when the seed sum in the set of all seeds reaches the number of instance of predetermined needs mark termination of iterations.

In said method, use the language material of mark automatically and the language material of unartificial mark is learnt.Automatically the seed of mark is higher than random labeled accuracy rate, makes that adopting automated process to obtain seed gathers more meaningful.In addition, can make proof procedure more effective with a plurality of relatively independent visual angles (like different seed set, different character set etc.) different sorter of training.

In addition, in said method, because use is the language material that marks automatically, noise information possibly exist from the beginning, and after each iteration, all possibly increase.In order to control noise effectively, make the result more reliable, train a plurality of sorters to be respectively applied for the checking of seed set and gather training classifier once more, to alleviate noise, the raising performance with the seed after the checking.Carry out prediction and the checking of example collection and gather training classifier once more with the seed of the instance that has added empirical tests with a plurality of sorters, make noise further alleviate, performance further improves.

Fig. 6 shows the indicative flowchart according to the machine learning method of two sorters of use of the embodiment of the invention.In Fig. 6, the given data set D that does not mark, example collection U to be marked, the instance number that needs to mark are n.

At first, adopt a kind of method to generate the seed S set automatically ₁, adopt another kind of method to generate the seed S set automatically ₂

Then, utilize the seed S set ₁Train first sorter C ₁, utilize the seed S set ₂Train first sorter C ₂

Then, utilize sorter C ₁And C ₂To seed S set through marking automatically ₁And S ₂Carry out cross validation.Particularly, utilize sorter C ₁Mark seed S set ₂, utilize sorter C ₂Mark seed S set ₁From the seed S set ₁And S ₂In delete automatic annotation results and sorter respectively annotation results have inconsistent seed, obtain the seed S set of empirical tests ₁And S ₂

Shown in the square frame among Fig. 6 610, above-mentioned steps can be generically and collectively referred to as initialization procedure.

In order further to improve performance, can in the processing procedure of example collection, also carry out cross validation, specific as follows.

At first, utilize the seed S set ₁Training classifier C once more ₁, utilize the seed S set ₂Training classifier C once more ₂

Then, utilize sorter C ₁Instance among the prediction sets U.Particularly, utilize sorter C ₁Instance among the mark set U is chosen m the instance that confidence level is the highest in the annotation results and is formed the example collection L that marks ₁, i.e. the example collection L of warp prediction ₁

Equally, utilize sorter C ₂Instance among the prediction sets U.Particularly, utilize sorter C ₂Instance among the mark set U is chosen m the instance that confidence level is the highest in the annotation results and is formed the example collection L that marks ₂, i.e. the example collection L of warp prediction ₂

Then, utilize sorter C ₁And C ₂To example collection L through prediction ₁And L ₂Carry out cross validation.Particularly, utilize C ₂Again mark example collection L ₁In instance, the deletion L ₁Middle C ₂Annotation results and C ₁The inconsistent instance that predicts the outcome, obtain the example collection L of empirical tests ₁Utilize C1 to mark example collection L again ₂In instance, the deletion L ₂Middle C ₁Annotation results and C ₂The inconsistent instance that predicts the outcome, obtain the example collection L of empirical tests ₂

Then, set L ₁In instance add the seed S set to ₁, set L ₂In instance add the seed S set to ₂, accomplish one time iteration.

Judge whether iteration should stop in the time of can or finishing in the beginning of an iteration.As stopping criterion for iteration, for example, can be at | S ₁∪ S ₂| under the situation of>=N, termination of iterations; Otherwise continuation iteration.

Shown in the square frame among Fig. 6 620, above-mentioned steps can be generically and collectively referred to as iterative process.

Fig. 7 shows the indicative flowchart according to the machine learning method of three sorters of use of the embodiment of the invention.Compare with Fig. 6, used three sorters in the method for Fig. 7.But each step and Fig. 6 are basic identical in the method, no longer repeat here.

What be worth explanation is that use sorter C has been shown in Fig. 7 ₂And C ₃To the seed S set that marks automatically ₁Verify, use sorter C ₁And C ₃To the seed S set that marks automatically ₂Verify, use sorter C ₁And C ₂To the seed S set that marks automatically ₃Verify.Respectively from the seed S set ₁, S ₂And S ₃There is inconsistent seed in middle deletion checking result with automatic annotation results, to obtain the seed S set of empirical tests ₁, S ₂And S ₃Yet, also can only use other sorters of a part that a seed set is verified.For example, can only be suitable for sorter C ₂To the seed S set ₁Verify, only be suitable for sorter C ₃To the seed S set ₂Verify etc.Here no longer enumerate.

Equally, although use sorter C has been shown among Fig. 7 ₂And C ₃To example collection L through prediction ₁Verify, use sorter C ₁And C ₃To example collection L through prediction ₂Verify, use sorter C ₁And C ₂To example collection L through prediction ₃Verify, still, also can use other sorters of part that an example collection is verified.For example, can only be suitable for sorter C ₂To example collection L ₁Verify, only be suitable for sorter C ₂To example collection L ₃Verify etc.Here no longer enumerate.

More than show the machine learning method example of using two sorters and three sorters, but this is just for illustration purpose, rather than will be with the present invention's restriction therewith.It will be understood by those skilled in the art that the situation that can be used for a plurality of sorters of other arbitrary numbers according to the machine learning method of the embodiment of the invention, repeat no more here.

Fig. 8 shows the schematic block diagram according to the machine learning device of the embodiment of the invention.As shown in the figure, machine learning device 800 comprises initialization unit 810 and optimization and processing unit 820.According to one embodiment of present invention, 810 configurable being used for of initialization unit: the data centralization of utilizing diverse ways never to mark marks and obtains a plurality of different seeds set automatically; Utilize the seed set of said a plurality of marks automatically to train corresponding a plurality of sorter respectively; And, utilize the part or all of sorter except that the sorter of training in said a plurality of sorter that this seed set is verified by this seed set for each the seed set in the seed set of said a plurality of marks automatically.Optimize to gather and train corresponding a plurality of sorter respectively once more with the configurable a plurality of seeds that are used to utilize empirical tests of processing unit 820.

According to another embodiment of the present invention, optimization and processing unit 820 also dispose and are used for: utilize a plurality of sorters of training once more that example collection is predicted respectively, to obtain corresponding a plurality of example collection through prediction; To each example collection, utilize the part or all of sorter except that the sorter that is used for this example collection is predicted in said a plurality of sorter that this example collection is verified through prediction; Instance in the example collection of each empirical tests is added corresponding seed set; And repeat said training once more, said example collection is predicted, said each example collection is verified and said instance in the example collection of each empirical tests added the set of corresponding seed, be met up to repeating end condition.

Based on another embodiment of the present invention, repeat end condition and be the number that seed sum in whole seeds set reaches the instance of predetermined needs mark.

According to another embodiment of the present invention, optimization and processing unit 820 further dispose and are used for: utilize said a plurality of sorter that example collection is marked respectively; And the instance of choosing the predetermined number that confidence level is the highest in the annotation results of each sorter of said a plurality of sorters is respectively formed corresponding a plurality of example collection through prediction.

According to another embodiment of the present invention, optimize and the processing unit 820 further example collection that are used for through with the checking warp prediction of getting off that dispose: utilize the part or all of sorter except that the sorter that is used for this example collection is predicted of said a plurality of sorters that this example collection is marked; And there is inconsistent instance in the annotation results that deletion predicts the outcome with said part or all of sorter from this example collection.

According to another embodiment of the present invention, initialization unit 810 further disposes and is used for verifying the seed set of mark automatically through some: utilize marking except that by the part or all of sorter the sorter of this seed set training this seed being gathered of said a plurality of sorters; And from the set of this seed, there is inconsistent seed between the annotation results of the automatic annotation results of deletion and said part or all of sorter.

Further details about according to the operation of the machine learning device of the embodiment of the invention can be not described in detail with reference to each embodiment of above-described method here.

In said method and device, through automated process the data set that does not mark is marked, need not artificial the participation, improved learning efficiency.In addition, carry out cross validation through seed being gathered, and utilize through the seed set of cross validation and train the respective classified device once more, controlled the noise of introducing by automatic mark effectively, realized fault-tolerant study with sorter.

Method and apparatus according to the embodiment of the invention is not done any restriction for the practical application scene.Also not restrictions such as training method for employed classifier type, sorter.

In addition, each forms module in the said apparatus, the unit can be configured through the mode of software, firmware, hardware or its combination.Dispose spendable concrete means or mode and be well known to those skilled in the art, repeat no more at this.Under situation about realizing through software or firmware, to computing machine the program that constitutes this software is installed from storage medium or network with specialized hardware structure, this computing machine can be carried out various functions etc. when various program is installed.

Fig. 9 shows the schematic block diagram that can be used for implementing according to the computing machine of the method and apparatus of the embodiment of the invention.In Fig. 9, CPU (CPU) 901 carries out various processing according to program stored among ROM (read-only memory) (ROM) 902 or from the program that storage area 908 is loaded into random-access memory (ram) 903.In RAM 903, also store data required when CPU 901 carries out various processing or the like as required.CPU 901, ROM 902 and RAM 903 are connected to each other via bus 904.Input/output interface 905 also is connected to bus 904.

Following parts are connected to input/output interface 905: importation 906 (comprising keyboard, mouse or the like), output 907 (comprise display; Such as cathode ray tube (CRT), LCD (LCD) etc. and loudspeaker etc.), storage area 908 (comprising hard disk etc.), communications portion 909 (comprising that NIC is such as LAN card, modulator-demodular unit etc.).Communications portion 909 is handled such as the Internet executive communication via network.As required, driver 910 also can be connected to input/output interface 905.Detachable media 911 can be installed on the driver 910 such as disk, CD, magneto-optic disk, semiconductor memory or the like as required, makes the computer program of therefrom reading be installed to as required in the storage area 908.

Realizing through software under the situation of above-mentioned series of processes, such as detachable media 911 program that constitutes software is being installed such as the Internet or storage medium from network.

It will be understood by those of skill in the art that this storage medium is not limited to shown in Figure 9 wherein having program stored therein, distribute so that the detachable media 911 of program to be provided to the user with equipment with being separated.The example of detachable media 911 comprises disk (comprising floppy disk (registered trademark)), CD (comprising compact disc read-only memory (CD-ROM) and digital universal disc (DVD)), magneto-optic disk (comprising mini-disk (MD) (registered trademark)) and semiconductor memory.Perhaps, storage medium can be hard disk that comprises in ROM 902, the storage area 908 or the like, computer program stored wherein, and be distributed to the user with the equipment that comprises them.

The present invention also proposes a kind of program product that stores the instruction code of machine-readable.When said instruction code is read and carried out by machine, can carry out above-mentioned method according to the embodiment of the invention.

Correspondingly, the storage medium that is used for carrying the program product of the above-mentioned instruction code that stores machine-readable is also included within of the present invention open.Said storage medium includes but not limited to floppy disk, CD, magneto-optic disk, storage card, memory stick or the like.

In the above in the description to the specific embodiment of the invention; Characteristic to a kind of embodiment is described and/or illustrated can be used in one or more other embodiment with identical or similar mode; Combined with the characteristic in other embodiment, or substitute the characteristic in other embodiment.

Should stress that term " comprises/comprise " existence that when this paper uses, refers to characteristic, key element, step or assembly, but not get rid of the existence of one or more further feature, key element, step or assembly or additional.

In addition, the time sequencing of describing during method of the present invention is not limited to is to specifications carried out, also can according to other time sequencing ground, carry out concurrently or independently.The execution sequence of the method for therefore, describing in this instructions does not constitute restriction to technical scope of the present invention.

Although the present invention is disclosed above through description to specific embodiment of the present invention,, should be appreciated that all above-mentioned embodiment and example all are exemplary, and nonrestrictive.Those skilled in the art can be in the spirit of accompanying claims and scope design to various modifications of the present invention, improve or equivalent.These modifications, improvement or equivalent also should be believed to comprise in protection scope of the present invention.

Remarks

1. 1 kinds of machine learning methods of remarks comprise:

The data centralization of utilizing diverse ways never to mark marks automatically and obtains n different seed S set ₁, S ₂..., S _n, n is natural number and n>=2;

Utilize said n the seed S set of mark automatically ₁, S ₂..., S _nTrain corresponding n sorter C respectively ₁, C ₂..., C _n

Automatically, each the seed S set during the seed of mark is gathered for said n _i, i=1,2 ..., n utilizes removing by this seed S set in the said n sorter _iThe sorter C of training _iOutside part or all of sorter to this seed S set _iVerify; And

Utilize said n seed S set of empirical tests ₁, S ₂..., S _nTrain corresponding n sorter C respectively once more ₁, C ₂..., C _n

Remarks 2. also comprises according to the method for remarks 1:

Utilize said n sorter of training once more that example collection is predicted respectively, to obtain corresponding n example collection L through prediction ₁, L ₂..., L _n

To each example collection L through prediction _i, i=1,2 ..., n utilizes removing in the said n sorter to be used for this example collection L _iThe sorter C that predicts _iOutside part or all of sorter to this example collection L _iVerify;

Example collection L with each empirical tests _iIn instance add corresponding seed S set _iAnd

Repeat said training once more, said example collection is predicted, said each example collection is verified and said instance in the example collection of each empirical tests added the set of corresponding seed, be met up to repeating end condition.

Remarks 3. is according to the method for remarks 2, and wherein, said repetition end condition is:

Said seed S set ₁, S ₂..., S _nIn the seed sum reach the number of the instance of predetermined needs mark.

Remarks 4. is according to the method for remarks 2, and wherein, said example collection is predicted comprises:

Utilize a said n sorter that said example collection is marked respectively; And

Choose the instance of the predetermined number that confidence level is the highest in the annotation results of each sorter of a said n sorter respectively and form corresponding n example collection L through prediction ₁, L ₂..., L _n

Remarks 5. is according to the method for remarks 2, and wherein, said checking is through the example collection L of prediction _iComprise:

Utilize removing in the said n sorter to be used for to this example collection L _iThe sorter C that predicts _iOutside part or all of sorter to this example collection L _iMark; And

From this example collection L _iThere is inconsistent instance in the middle annotation results that predicts the outcome with said part or all of sorter of deleting.

Remarks 6. is according to the method for remarks 1, wherein, and the seed S set that said checking marks automatically _iComprise:

Utilize removing in the said n sorter by this seed S set _iThe sorter C of training _iOutside part or all of sorter to this seed S set _iMark; And

From this seed S set _iThere is inconsistent seed between the annotation results of middle automatic annotation results of deletion and said part or all of sorter.

7. 1 kinds of machine learning devices of remarks comprise:

Initialization unit, configuration is used for:

Utilize said n the seed S set of mark automatically ₁, S ₂..., S _nTrain corresponding n sorter C respectively ₁, C ₂..., C _nAnd

Optimize and processing unit, configuration is used for:

Remarks 8. is according to the device of remarks 7, and wherein, said optimization and processing unit also dispose and be used for:

Remarks 9. is according to the device of remarks 8, and wherein, said repetition end condition is:

Remarks 10. is according to the device of remarks 8, and wherein, said optimization and processing unit further configuration are used for:

Remarks 11. is according to the device of remarks 8, and wherein, said optimization and processing unit further configuration are used for through the example collection L with the checking warp prediction of getting off _i:

Remarks 12. is according to the device of remarks 7, and wherein, said initialization unit further configuration is used for through verify the seed S set of mark automatically to get off _i:

Claims

1. machine learning method comprises:

2. according to the method for claim 1, also comprise:

3. according to the method for claim 2, wherein, said repetition end condition is:

4. according to the method for claim 2, wherein, said example collection is predicted comprises:

5. according to the method for claim 2, wherein, said checking is through the example collection L of prediction _iComprise:

6. according to the process of claim 1 wherein the seed S set that said checking marks automatically _iComprise:

7. machine learning device comprises:

Initialization unit, configuration is used for:

Optimize and processing unit, configuration is used for:

8. according to the device of claim 7, wherein, said optimization and processing unit also dispose and are used for:

9. according to Claim 8 device, wherein, said optimization and processing unit further configuration are used for through with the example collection L of checking through prediction that get off _i:

10. according to the device of claim 7, wherein, said initialization unit further configuration is used for through verify the seed S set of mark automatically to get off _i: