CN110232411A

CN110232411A - Model distills implementation method, device, system, computer equipment and storage medium

Info

Publication number: CN110232411A
Application number: CN201910463011.6A
Authority: CN
Inventors: 李超; 刘国翌; 张家栋
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2019-05-30
Filing date: 2019-05-30
Publication date: 2019-09-13
Anticipated expiration: 2039-05-30
Also published as: CN110232411B

Abstract

The invention discloses models to distill implementation method, device, system, computer equipment and storage medium, wherein method can include: tutor model client is directed to every image for model training, it carries out the following processing respectively: obtaining the prediction label of image, the prediction label is to generate after tutor model server-side carries out forward prediction to image；Image and prediction label are stored in image data queue, so that model trainer carries out student model training based on the content in image data queue.Using scheme of the present invention, model training speed etc. can be accelerated.

Description

Model distills implementation method, device, system, computer equipment and storage medium

[technical field]

The present invention relates to Computer Applied Technology, in particular to model distills implementation method, and device, system, computer are set Standby and storage medium.

[background technique]

In recent years, convolutional neural networks technology is in computer vision field broad development, the structure of convolutional neural networks More and more, applicable range is also different, is such as based on residual error neural network (ResNet, Residual Neural Network) the disaggregated model of 152 structures has the characteristics that precision is high, effect is good, but its parameter is more, model is big, computationally intensive, The limitation of memory space, computing capability can not be widely used in receiving end.

In consideration of it, some convolutional neural networks structures for mobile phone terminal are come into being, as MobileNet model, ShuffleNet model etc., these convolutional neural networks structural parameters are few, calculation amount is small, are highly suitable for holding operation, but mould There are also larger gaps compared with ResNet152 model etc. for type precision and effect.Therefore, how to be promoted the precision of mini Mod on end at For assistant officer's problem to be solved.

In order to promote the precision of mini Mod on end, the mode of model distillation is generallyd use at present, that is, uses a high-precision Large-sized model go to instruct the training of mini Mod, large-sized model can be described as tutor model again, such as can be ResNet152 model, small mould Type can be described as student model again, such as can be MobileNet model.The multiple student's moulds of tutor model training can be used Type.

Entire training process generally comprises more wheels, and every wheel all can be using one time for the figure in the image set of model training Picture is limited by training hardware resource, and the training of every wheel can be divided into repeatedly again, and the one group of image obtained in image set every time is used for Training.Each training process can include: tutor model and student model pull same group of image；Tutor model and student model Forward prediction (forward calculation) is carried out to the image pulled respectively, obtains tutor model output and student model output；Pass through damage Lose the error that function calculates tutor model output and student model output；Backpropagation is carried out to student model according to error, more New student model, the student model after the completion of training is required model.

But aforesaid way can also have certain problems in practical applications, such as: tutor model and student model are shared Training server, the model parameter of tutor model is more, computationally intensive, can seize limited training resource, such as graphics processor (GPU, Graphics Processing Unit) resource etc., and student model is the model of required training, it so undoubtedly can shadow Ring the training speed etc. of student model.

[summary of the invention]

In view of this, the present invention provides models to distill implementation method, device, system, computer equipment and storage medium.

Specific technical solution is as follows:

A kind of model distillation implementation method, comprising:

Tutor model client is directed to every image for model training, carries out the following processing respectively:

Obtain described image prediction label, the prediction label be tutor model server-side to described image carry out before to It is generated after prediction；

Described image and the prediction label are stored in image data queue, so that model trainer is based on described image number Student model training is carried out according to the content in queue.

A kind of model distillation implementation method, comprising:

The tutor model server-side image for model training requested to tutor model client carries out forward prediction, Generate prediction label；

The prediction label is returned to the tutor model client by the tutor model server-side, so as to the teacher Described image and the prediction label are stored in image data queue by model client end, are based on described image for model trainer Content in data queue carries out student model training.

A kind of model distillation implementation method, comprising:

Model trainer obtains student model output；The student model output is student model from image data queue Output result after pulling image and corresponding prediction label, after forward prediction is carried out using the image pulled；The figure As preserved in data queue the deposit of tutor model client image for model training and corresponding prediction label；It is described Prediction label is to generate and send after tutor model server-side carries out forward prediction to described image to the tutor model client End；

The model trainer calculates the error of the prediction label pulled and student model output, according to institute It states error and backpropagation is carried out to the student model, update the student model.

A kind of model distillation realization device, the model distillation realization device are applied in tutor model client, comprising: First acquisition unit and storage unit；

The first acquisition unit, for obtaining the pre- of described image respectively for every image for model training Mark label, the prediction label are to generate after tutor model server-side carries out forward prediction to described image；

The storage unit, for described image and the prediction label to be stored in image data queue, so as to model instruction Practice device and student model training is carried out based on the content in described image data queue.

A kind of model distillation realization device, the model distillation realization device are applied in tutor model server-side, comprising: Predicting unit and feedback unit；

The predicting unit, it is preceding to pre- for being carried out to the requested image for model training of tutor model client It surveys, generates prediction label；

The feedback unit, for the prediction label to be returned to the tutor model client, so as to the teacher Described image and the prediction label are stored in image data queue by model client end, are based on described image for model trainer Content in data queue carries out student model training.

A kind of model distillation realization device, the model distillation realization device are applied in model trainer, comprising: second Acquiring unit and updating unit；

The second acquisition unit, for obtaining student model output；Student model output is student model from figure Output after pulling image and corresponding prediction label in data queue, after forward prediction is carried out using the image pulled As a result；Preserved in described image data queue the deposit of tutor model client for the image of model training and corresponding pre- Mark label；The prediction label is to generate and send after tutor model server-side carries out forward prediction to described image to the religion Teacher's model client end；

The updating unit, for calculating the error of the prediction label pulled and student model output, root Backpropagation is carried out to the student model according to the error, updates the student model.

A kind of model distillation realization system, comprising: three kinds of models as described above distill realization device.

A kind of computer equipment, including memory, processor and be stored on the memory and can be in the processor The computer program of upper operation, the processor realize method as described above when executing described program.

A kind of computer readable storage medium is stored thereon with computer program, real when described program is executed by processor Now method as described above.

It can be seen that based on above-mentioned introduction using scheme of the present invention, it can be by the processing of tutor model side and student's mould The training decoupling of type, such as after the prediction label that tutor model completes forward prediction generation image, using being stored in picture number According to the training of image and corresponding prediction label progress student model in queue, so as to avoid tutor model to training resource Seize, allow student model that must exclusively enjoy training resource as far as possible, accelerate training speed etc..

[Detailed description of the invention]

Fig. 1 is the flow chart that model of the present invention distills implementation method first embodiment.

Fig. 2 is that process schematic is realized in the distillation of existing model.

Fig. 3 is the flow chart that model of the present invention distills implementation method second embodiment.

Fig. 4 is the flow chart that model of the present invention distills implementation method 3rd embodiment.

Fig. 5 is that process schematic is realized in model of the present invention distillation.

Fig. 6 is the composed structure schematic diagram that model of the present invention distills realization device first embodiment.

Fig. 7 is the composed structure schematic diagram that model of the present invention distills realization device second embodiment.

Fig. 8 is the composed structure schematic diagram that model of the present invention distills realization device 3rd embodiment.

Fig. 9 shows the block diagram for being suitable for the exemplary computer system/server 12 for being used to realize embodiment of the present invention.

[specific embodiment]

In order to be clearer and more clear technical solution of the present invention, hereinafter, referring to the drawings and the embodiments, to institute of the present invention The scheme of stating is further described.

Obviously, described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on the present invention In embodiment, those skilled in the art's all other embodiment obtained without creative efforts, all Belong to the scope of protection of the invention.

In addition, it should be understood that the terms "and/or", a kind of only incidence relation for describing affiliated partner, expression can With there are three kinds of relationships, for example, A and/or B, can indicate: individualism A exists simultaneously A and B, these three feelings of individualism B Condition.In addition, character "/" herein, typicallys represent the relationship that forward-backward correlation object is a kind of "or".

Fig. 1 is the flow chart that model of the present invention distills implementation method first embodiment.As shown in Figure 1, including following Specific implementation.

In 101, tutor model client is directed to every image for model training, respectively according to shown in 102-103 Mode is handled.

In 102, obtain the prediction label of image, the prediction label be before tutor model server-side carries out image to It is generated after prediction.

In 103, image and prediction label are stored in image data queue, so that model trainer is based on image data team Content in column carries out student model training.

For convenient for statement, image to be processed is known as image a.For image a, its prediction label, such as image can be obtained Classification results.Specifically, it can first determine whether the prediction label for being cached with image a, if so, the prediction label of caching can be made For the prediction label for getting image a, if it is not, the prediction label of image a can be requested to tutor model server-side, and teacher is obtained The prediction label for the image a that model service end returns, can also cache the prediction label of image a.

As can be seen that tutor model is divided into tutor model client and tutor model service in scheme of the present invention End can request image a to tutor model server-side when tutor model client determines the uncached prediction label for having image a Prediction label, and tutor model server-side can be obtained, the prediction label that generates after forward prediction is carried out to image a, be cached Deng.

In order to reinforce model generalization ability, lift scheme training effect etc., it is usual to carry out the image that forward prediction is based on For pretreated image.For this purpose, tutor model client produces the pretreatment identifier (pkey) of image a, it is used to indicate The pretreatment operation executed to image a is needed, and then image a and pretreatment identifier can be sent to tutor model service End, to the prediction label of tutor model server-side request image a.Correspondingly, tutor model server-side can be identified according to pretreatment Symbol executes pretreatment operation to image a, generates prediction label after can carrying out forward prediction based on pretreated image later.

Whether pretreatment operation may include whether to need to invert image, whether need to carry out color transformed, need It zooms in and out.The pretreatment operation carried out needed for can indicating in pretreatment identifier, i.e., can be reflected by pretreatment identifier It is mapped to one group of image pretreatment operation.The specific format for pre-processing identifier with no restriction, can be determined according to actual needs.Teacher How model client end, which determines, needs to execute image a which pretreatment operation equally with no restriction, can according to actual needs and It is fixed.Tutor model server-side can carry out pretreatment operation to image a, it is anti-such as to carry out image according to the instruction of pretreatment identifier Turn and color transformed etc., pretreated image a can be passed to teacher's network later, carry out forward prediction, obtain prediction label, Return to tutor model client.

As previously mentioned, pre-processing the model generalization ability, lift scheme training effect etc. of can reinforcing to image, therefore teach Teacher's model client end can obtain pretreated image a, and the prediction label of pretreated image a and image a is stored in image Data queue, that is, the image a being deposited into image data queue not original image a carry out pretreated image a。

Image a can be pre-processed by tutor model client, obtain pretreated image a.Alternatively, teacher's mould Type client can also obtain the pretreated image a of tutor model server-side return, and tutor model server-side is according to pre- place It manages after identifier executes pretreatment operation to image a, pretreated image a can be returned to tutor model client, it can be with It returns, can also be returned respectively simultaneously with prediction label.Tutor model client can also cache pretreated image a, So as to directly be used in subsequent need.

Model trainer can carry out student model training based on the content in image data queue, as model trainer can obtain Student model is taken to export, student model output is that student model is pulled from image data queue to image and corresponding pre- After mark label, the output after forward prediction is carried out using the image pulled is as a result, calculate the prediction label pulled and student The error of model output carries out backpropagation to student model according to error, updates student model.

As can be seen that the training of the processing of tutor model side and student model can be decoupled in scheme described in the present embodiment, Such as after the prediction label that tutor model completes that forward prediction generates image, the image being stored in image data queue is utilized And corresponding prediction label carries out the training of student model, seizes so as to avoid tutor model to training resource, so that learning Raw model must can exclusively enjoy training resource as far as possible, accelerate training speed etc..

In addition, also can avoid a large amount of repetition using scheme described in the present embodiment and settle accounts, save computing resource etc..

Fig. 2 is that process schematic is realized in the distillation of existing model.As illustrated in fig. 2, it is assumed that there are two student models, respectively Student model A and student model B, these student models usually have the following characteristics that training mission difference, as student model A is Animal classification model, student model B are plant classification model；Network structure is different, if student model A is MobileNet model, Student model B is ShuffleNet model；Training configuration is different, such as different learning rate, different optimization methods, different pre- Processing mode etc.；Wherein, latter two mode is especially common.

As shown in Fig. 2, the training process of different student models is similar, model training will be used in hard disk by specifically including that Image set in image reading into memory；Image is pre-processed；Pretreated image can be stored in image data In queue, used for model trainer.Training in model trainer would generally can all use an image comprising more wheels, every wheel The image of concentration is limited by training hardware resource, and the training of every wheel can be divided into repeatedly again, can obtain one in image set every time Group image is for training.Each training process can include: tutor model and student model pull same group of image；Tutor model Forward prediction is carried out to the image pulled respectively with student model, obtains tutor model output and student model output；Pass through Loss function calculates the error of tutor model output and student model output；Backpropagation is carried out to student model according to error, Student model is updated, the student model after the completion of training is required model.

Training process generally requires to carry out tens wheel even wheels up to a hundred, and according to mode shown in Fig. 2, difference takes turns the data used Often repeat, for example, tutor model be it is fixed, be required in each round to image carry out forward prediction, thus meeting Lead to a large amount of duplicate calculating, causes the waste etc. of computing resource.

And use scheme described in the present embodiment can be directly by prediction label cached before etc. in the training of every wheel It is deposited into image data queue (in general, image data queue can empty after the training of every wheel), so as to avoid a large amount of weight It is multiple to calculate, dramatically save computing resource etc..

Fig. 3 is the flow chart that model of the present invention distills implementation method second embodiment.As shown in figure 3, including following Specific implementation.

In 301, the tutor model server-side image for model training requested to tutor model client is carried out Forward prediction generates prediction label.

In 302, prediction label is returned to tutor model client by tutor model server-side, so as to tutor model client Image and prediction label are stored in image data queue by end, are carried out for model trainer based on the content in image data queue Student model training.

For convenient for statement, requested image is known as image a.For image a, tutor model client is not delayed in determination When having the prediction label of image a, the prediction label of image a can be requested to tutor model server-side.Correspondingly, tutor model takes Business end can carry out forward prediction to image a, generate prediction label, return to tutor model client.Tutor model client can The prediction label of image a and image a are stored in image data queue, so that model trainer is based in image data queue Hold and carry out student model training, and the prediction label of image a can be cached.

Tutor model client also produces the pretreatment identifier of image a, is used to indicate and needs to the pre- of image a execution Processing operation, and then image a and pretreatment identifier can be sent to tutor model server-side, it is requested to tutor model server-side The prediction label of image a.Tutor model server-side gets image a and pretreatment identifier from tutor model client Afterwards, pretreatment operation can be executed to image a according to pretreatment identifier, pretreated image a can be passed to teacher's net later Network carries out forward prediction, obtains prediction label, return to tutor model client.

Whether pretreatment operation may include whether to need to invert image, whether need to carry out color transformed, need It zooms in and out.

Pretreated image a can also be returned to tutor model client by tutor model server-side, can be with pre- mark It signs while returning, can also return respectively.Tutor model client can deposit the prediction label of pretreated image a and image a Enter image data queue.

In addition, deep learning task is generally divided into two stages, training stage and forecast period, the training stage can be to network Forward calculation and backpropagation are carried out, and forecast period only will do it forward calculation, can optimize to it, as the layer of model melts Close, store multiplexing, calculation method selection etc., in scheme described in the present embodiment, since forward prediction is decoupled with model training, Therefore the prediction engines such as TensorRT, Anakin, which can be used, optimizes acceleration, compared to existing way, forward calculation performance 2-3 times even 4-5 times can be promoted, and resource utilization can be improved etc..

Fig. 4 is the flow chart that model of the present invention distills implementation method 3rd embodiment.As shown in figure 4, including following Specific implementation.

In 401, model trainer obtains student model output；Student model output is student model from image data team Output result after pulling image and corresponding prediction label in column, after forward prediction is carried out using the image pulled；Figure As preserved in data queue the deposit of tutor model client image for model training and corresponding prediction label；Prediction Label is to generate and send after tutor model server-side carries out forward prediction to image to tutor model client.

In 402, model trainer calculates the error of the prediction label pulled and student model output, according to error pair Student model carries out backpropagation, updates student model.

Training in model trainer would generally can all use the image in an image set comprising more wheels, every wheel, trained Practice the limitation of hardware resource, the training of every wheel can be divided into one group of image (including the image that can repeatedly obtain every time in image set again With corresponding prediction label) for training.

Student model can pull one group of image from image data queue every time, and before being carried out to the image pulled to Prediction obtains student model output, can calculate the prediction label pulled and student model output by loss function later Error, and then backpropagation can be carried out to student model according to calculated error, update student model.Wherein, how to carry out Forward prediction, how to calculate error and how to carry out backpropagation etc. to student model be the prior art.

Image in image data queue can be pretreated image, can instruct for tutor model client to for model What experienced image obtained after being pre-processed, the image for model training can also be pre-processed for tutor model server-side It is sent to tutor model client afterwards.Prediction label can for tutor model server-side be based on pretreated image carry out before to What prediction generated.

In summary it introduces, Fig. 5 is that process schematic is realized in model of the present invention distillation.

As shown in figure 5, image data reading can be carried out, i.e., by the image in the image set for being used for model training in hard disk It is read into memory.

For every image a in image set, tutor model client can be carried out the following processing: determine whether to be cached with figure As the prediction label of a, if so, the prediction label of pretreated image a and image a can be stored in image data queue, if it is not, Image a and pretreatment identifier are sent to tutor model server-side, Xiang Jiaoshi mould by the pretreatment identifier for producing image a Type server-side requests the prediction label of image a, and obtains the prediction label of the image a of tutor model server-side return, by image a Prediction label cached, and by the prediction label of pretreated image a and image a be stored in image data queue.Assuming that Pretreated image a is to obtain after tutor model client pre-processes image a.

Tutor model server-side, can after receiving the image a and pretreatment identifier that tutor model client is sent Pretreatment operation is executed to image a according to pretreatment identifier, and forward prediction can be carried out based on pretreated image a, is obtained To prediction label, tutor model client is returned to.

Training in model trainer would generally be comprising more wheels, when in each round each trained, and student model can be from figure As pulling image (pretreated image) and corresponding prediction label in data queue, before being carried out using the image that pulls to Prediction obtains student model output, can calculate the prediction label pulled and student model output by loss function later Error, and then backpropagation can be carried out to student model according to calculated error, update student model.

Usually, after every wheel training, image data queue can be emptied, when next round is trained, tutor model client End, tutor model server-side and model trainer repeat above-mentioned processing, until training is completed.

It should be noted that for the various method embodiments described above, for simple description, being all expressed as a series of Combination of actions, but those skilled in the art should understand that, the present invention is not limited by the sequence of acts described, because according to According to the present invention, certain steps can use other sequences or carry out simultaneously.Secondly, those skilled in the art should also know that, The embodiments described in the specification are all preferred embodiments, and not necessarily the present invention must for related actions and modules Must.

In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment Point, it may refer to the associated description of other embodiments.

In short, using scheme described in embodiment of the present invention method, by the training of the processing of tutor model side and student model Decoupling, thus the maintenance management of more convenient pair of tutor model, such as prediction engine optimization, and can avoid tutor model to training Resource is seized, and allows student model that must exclusively enjoy training resource as far as possible, accelerates training speed, further more, by caching Reason, so that can between the training mission of different student models and the different iteration (training of difference wheel) of the same training mission Computing resource etc. is saved with shared calculated result so as to avoid largely computing repeatedly.

The introduction about embodiment of the method above, below by way of Installation practice, to scheme of the present invention carry out into One step explanation.

Fig. 6 is the composed structure schematic diagram that model of the present invention distills realization device first embodiment.The present embodiment institute State model distillation realization device can be applied in tutor model client, as shown in Figure 6, comprising: first acquisition unit 601 and Storage unit 602.

First acquisition unit 601, for obtaining the pre- mark of image respectively for every image for model training Label, the prediction label are to generate after tutor model server-side carries out forward prediction to image.

Storage unit 602, for image and prediction label to be stored in image data queue, so that model trainer is based on figure As the content in data queue carries out student model training.

First acquisition unit 601 can first determine whether to be cached with pre- mark in the prediction label for needing to obtain image Label, if so, can be using the prediction label of caching as the prediction label got, if it is not, can be requested to tutor model server-side Prediction label, and the prediction label of tutor model server-side return is obtained, it is cached.

Specifically, first acquisition unit 601 produces the pretreatment identifier of image, and pretreatment identifier is used to indicate and needs Image and pretreatment identifier are sent to tutor model server-side, to tutor model by the pretreatment operation to execute to image Server-side requests prediction label, and then can obtain tutor model server-side and execute pretreatment behaviour to image according to pretreatment identifier The prediction label of forward prediction generation is carried out after work, based on pretreated image.

Preferably, the image being deposited into image data queue can be pretreated image, and pretreated image can It obtains, image can also be located in advance for tutor model server-side after being pre-processed for first acquisition unit 601 to image First acquisition unit 601 is sent to after reason.

Fig. 7 is the composed structure schematic diagram that model of the present invention distills realization device second embodiment.The present embodiment institute Stating model distillation realization device can be applied in tutor model server-side, as shown in fig. 7, comprises: predicting unit 701 and feedback Unit 702.

Predicting unit 701, it is preceding to pre- for being carried out to the requested image for model training of tutor model client It surveys, generates prediction label.

Feedback unit 702, for prediction label to be returned to tutor model client, so that tutor model client will be schemed Picture and prediction label are stored in image data queue, carry out student's mould based on the content in image data queue for model trainer Type training.

Predicting unit 701 can obtain image and pretreatment identifier from tutor model client, pre-process identifier It is used to indicate the pretreatment operation for needing to execute to image, pretreatment operation is executed to image according to pretreatment identifier, is based on Pretreated image carries out forward prediction.For example, figure can be carried out to the image got according to the instruction of pretreatment identifier As reversion and the pretreatment operations such as color transformed, pretreated image can be passed to teacher's network later, carry out forward prediction, Obtain prediction label.

Pretreated image can also be returned to tutor model client by feedback unit 702, so as to tutor model client Pretreated image and prediction label are stored in image data queue by end.For example, being identified in predicting unit 701 according to pretreatment After symbol executes pretreatment operation to image, pretreated image can be returned to tutor model client by feedback unit 702, can To be returned simultaneously with prediction label, can also return respectively.

Fig. 8 is the composed structure schematic diagram that model of the present invention distills realization device 3rd embodiment.The present embodiment institute Stating model distillation realization device can be applied in model trainer, as shown in Figure 8, comprising: second acquisition unit 801 and update Unit 802.

Second acquisition unit 801, for obtaining student model output；Student model output is student model from image Output knot after pulling image and corresponding prediction label in data queue, after forward prediction is carried out using the image pulled Fruit；The image and corresponding pre- mark for model training of tutor model client deposit are preserved in image data queue Label；The prediction label is to generate and send after tutor model server-side carries out forward prediction to image and give tutor model client 's.

Updating unit 802, for calculating the error of the prediction label pulled and student model output, according to error to Raw model carries out backpropagation, updates student model.

Training in model trainer would generally can be divided into repeatedly again comprising more wheels, the training of every wheel.Student model is each One group of image can be pulled from image data queue, and can carry out forward prediction to the image pulled, and it is defeated to obtain student model Out, the error of the prediction label pulled and student model output can be calculated by loss function later, and then can be according to calculating Error out carries out backpropagation to student model, updates student model.Wherein, how to carry out forward prediction, how to calculate mistake Difference and how to carry out backpropagation etc. to student model be the prior art.

Preferably, the image in image data queue is pretreated image, can be tutor model client to image It is obtained after being pre-processed, is sent to tutor model client after can also pre-processing for tutor model server-side to image 's.Prediction label can carry out forward prediction generation based on pretreated image for tutor model server-side.

The present invention discloses a kind of models to distill realization system, it may include: the model in embodiment as shown in Figure 6 steams Realization device, the model distillation realization device in embodiment as shown in Figure 7 are evaporated, and, the model in embodiment as shown in Figure 8 steams Evaporate realization device.

The specific workflow of above-mentioned apparatus and system embodiment please refers to the related description in preceding method embodiment, no It repeats again.

In short, using scheme described in apparatus and system of the present invention embodiment, by the processing of tutor model side and student model Training decoupling, thus the maintenance management of more convenient pair of tutor model, such as prediction engine optimization, and can avoid tutor model Training resource is seized, allows student model that must exclusively enjoy training resource as far as possible, accelerates training speed, further more, passing through Caching process, so that the different iteration (training of difference wheel) of the training mission of different student models and the same training mission Between can share calculated result, so as to avoid largely computing repeatedly, save computing resource etc..

Fig. 9 shows the block diagram for being suitable for the exemplary computer system/server 12 for being used to realize embodiment of the present invention. The computer system/server 12 that Fig. 9 is shown is only an example, should not function and use scope to the embodiment of the present invention Bring any restrictions.

As shown in figure 9, computer system/server 12 is showed in the form of universal computing device.Computer system/service The component of device 12 can include but is not limited to: one or more processor (processing unit) 16, memory 28, connect not homology The bus 18 of system component (including memory 28 and processor 16).

Bus 18 indicates one of a few class bus structures or a variety of, including memory bus or Memory Controller, Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.It lifts For example, these architectures include but is not limited to industry standard architecture (ISA) bus, microchannel architecture (MAC) Bus, enhanced isa bus, Video Electronics Standards Association (VESA) local bus and peripheral component interconnection (PCI) bus.

Computer system/server 12 typically comprises a variety of computer system readable media.These media, which can be, appoints What usable medium that can be accessed by computer system/server 12, including volatile and non-volatile media, it is moveable and Immovable medium.

Memory 28 may include the computer system readable media of form of volatile memory, such as random access memory Device (RAM) 30 and/or cache memory 32.Computer system/server 12 may further include it is other it is removable/no Movably, volatile/non-volatile computer system storage medium.Only as an example, storage system 34 can be used for reading and writing Immovable, non-volatile magnetic media (Fig. 9 do not show, commonly referred to as " hard disk drive ").It, can although being not shown in Fig. 9 To provide the disc driver for reading and writing to removable non-volatile magnetic disk (such as " floppy disk "), and it is non-volatile to moving Property CD (such as CD-ROM, DVD-ROM or other optical mediums) read and write CD drive.In these cases, each drive Dynamic device can be connected by one or more data media interfaces with bus 18.Memory 28 may include at least one program Product, the program product have one group of (for example, at least one) program module, these program modules are configured to perform the present invention The function of each embodiment.

Program/utility 40 with one group of (at least one) program module 42 can store in such as memory 28 In, such program module 42 includes --- but being not limited to --- operating system, one or more application program, other programs It may include the realization of network environment in module and program data, each of these examples or certain combination.Program mould Block 42 usually executes function and/or method in embodiment described in the invention.

Computer system/server 12 can also be (such as keyboard, sensing equipment, aobvious with one or more external equipments 14 Show device 24 etc.) communication, it is logical that the equipment interacted with the computer system/server 12 can be also enabled a user to one or more Letter, and/or with the computer system/server 12 any is set with what one or more of the other calculating equipment was communicated Standby (such as network interface card, modem etc.) communicates.This communication can be carried out by input/output (I/O) interface 22.And And computer system/server 12 can also pass through network adapter 20 and one or more network (such as local area network (LAN), wide area network (WAN) and/or public network, such as internet) communication.As shown in figure 9, network adapter 20 passes through bus 18 communicate with other modules of computer system/server 12.It should be understood that although not shown in the drawings, computer can be combined Systems/servers 12 use other hardware and/or software module, including but not limited to: microcode, device driver, at redundancy Manage unit, external disk drive array, RAID system, tape drive and data backup storage system etc..

The program that processor 16 is stored in memory 28 by operation, at various function application and data Reason, such as realize the method in Fig. 1, Fig. 3 or embodiment illustrated in fig. 4.

The present invention discloses a kind of computer readable storage mediums, are stored thereon with computer program, the program quilt It will be realized such as the method in Fig. 1, Fig. 3 or embodiment illustrated in fig. 4 when processor executes.

It can be using any combination of one or more computer-readable media.Computer-readable medium can be calculating Machine readable signal medium or computer readable storage medium.Computer readable storage medium for example can be --- but it is unlimited In system, device or the device of --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, or any above combination.It calculates The more specific example (non exhaustive list) of machine readable storage medium storing program for executing includes: electrical connection with one or more conducting wires, just Taking formula computer disk, hard disk, random access memory (RAM), read-only memory (ROM), erasable type may be programmed read-only storage Device (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device, Or above-mentioned any appropriate combination.In this document, computer readable storage medium can be it is any include or storage journey The tangible medium of sequence, the program can be commanded execution system, device or device use or in connection.

Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including --- but It is not limited to --- electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be Any computer-readable medium other than computer readable storage medium, which can send, propagate or Transmission is for by the use of instruction execution system, device or device or program in connection.

The program code for including on computer-readable medium can transmit with any suitable medium, including --- but it is unlimited In --- wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.

The computer for executing operation of the present invention can be write with one or more programming languages or combinations thereof Program code, described program design language include object oriented program language-such as Java, Smalltalk, C++, Further include conventional procedural programming language-such as " C " language or similar programming language.Program code can be with It fully executes, partly execute on the user computer on the user computer, being executed as an independent software package, portion Divide and partially executes or executed on a remote computer or server completely on the remote computer on the user computer.? Be related in the situation of remote computer, remote computer can pass through the network of any kind --- including local area network (LAN) or Wide area network (WAN)-be connected to subscriber computer, or, it may be connected to outer computer (such as mentioned using Internet service It is connected for quotient by internet).

In several embodiments provided by the present invention, it should be understood that disclosed device and method etc. can pass through Other modes are realized.For example, the apparatus embodiments described above are merely exemplary, for example, the division of the unit, Only a kind of logical function partition, there may be another division manner in actual implementation.

The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.

It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of hardware adds SFU software functional unit.

The above-mentioned integrated unit being realized in the form of SFU software functional unit can store and computer-readable deposit at one In storage media.Above-mentioned SFU software functional unit is stored in a storage medium, including some instructions are used so that a computer It is each that equipment (can be personal computer, server or the network equipment etc.) or processor (processor) execute the present invention The part steps of embodiment the method.And storage medium above-mentioned include: USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic or disk etc. it is various It can store the medium of program code.

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Within mind and principle, any modification, equivalent substitution, improvement and etc. done be should be included within the scope of the present invention.

Claims

1. a kind of model distills implementation method characterized by comprising

The prediction label of described image is obtained, the prediction label is that tutor model server-side carries out forward prediction to described image It generates afterwards；

Described image and the prediction label are stored in image data queue, so that model trainer is based on described image data team Content in column carries out student model training.

2. the method according to claim 1, wherein

It is described obtain described image prediction label include:

Determine whether to be cached with the prediction label；

If so, using the prediction label of caching as the prediction label got；

If it is not, then requesting the prediction label to the tutor model server-side, obtain what the tutor model server-side returned The prediction label, and the prediction label is cached.

3. according to the method described in claim 2, it is characterized in that,

This method further comprises: the tutor model client generates the pretreatment identifier of described image, the pretreatment Identifier is used to indicate the pretreatment operation for needing to execute to described image；

It is described that request the prediction label to the tutor model server-side include: the tutor model client by described image And the pretreatment identifier is sent to the tutor model server-side, Xiang Suoshu tutor model server-side requests the pre- mark Label；

The prediction label for obtaining the tutor model server-side return includes: to obtain the tutor model server-side root It is raw to forward prediction is carried out after described image execution pretreatment operation, based on pretreated image according to the pretreatment identifier At prediction label.

4. according to the method described in claim 3, it is characterized in that,

Image in described image data queue is pretreated image；

The pretreated image tutor model client obtains after pre-processing to described image, alternatively, The tutor model client is sent to after pre-processing for the tutor model server-side to described image.

5. a kind of model distills implementation method characterized by comprising

The tutor model server-side image for model training requested to tutor model client carries out forward prediction, generates Prediction label；

The prediction label is returned to the tutor model client by the tutor model server-side, so as to the tutor model Described image and the prediction label are stored in image data queue by client, are based on described image data for model trainer Content in queue carries out student model training.

6. according to the method described in claim 5, it is characterized in that,

This method further comprises: the tutor model server-side obtain described image from the tutor model client and Identifier is pre-processed, the pretreatment identifier is used to indicate the pretreatment operation for needing to execute to described image；

The progress forward prediction includes: that the tutor model server-side executes described image according to the pretreatment identifier Pretreatment operation carries out forward prediction based on pretreated image.

7. according to the method described in claim 6, it is characterized in that,

This method further comprises: the pretreated image is returned to the tutor model by the tutor model server-side Client, so that the pretreated image and the prediction label are stored in described image number by the tutor model client According to queue.

8. a kind of model distills implementation method characterized by comprising

Model trainer obtains student model output；The student model output is that student model is pulled from image data queue Output result to after image and corresponding prediction label, after forward prediction is carried out using the image pulled；Described image number According to the image for model training and corresponding prediction label for preserving the deposit of tutor model client in queue；The prediction Label is to generate and send after tutor model server-side carries out forward prediction to described image to the tutor model client；

The model trainer calculates the error of the prediction label pulled and student model output, according to the mistake Difference carries out backpropagation to the student model, updates the student model.

9. according to the method described in claim 8, it is characterized in that,

Image in described image data queue is pretreated image；

The pretreated image tutor model client obtains after pre-processing to described image, alternatively, The tutor model client is sent to after pre-processing for the tutor model server-side to described image；

The prediction label is the tutor model server-side based on the progress forward prediction generation of pretreated image.

10. a kind of model distills realization device, which is characterized in that the model distillation realization device is applied to tutor model client In end, comprising: first acquisition unit and storage unit；

The first acquisition unit, for obtaining the pre- mark of described image respectively for every image for model training Label, the prediction label are to generate after tutor model server-side carries out forward prediction to described image；

The storage unit, for described image and the prediction label to be stored in image data queue, so as to model trainer Student model training is carried out based on the content in described image data queue.

11. device according to claim 10, which is characterized in that

The first acquisition unit determines whether to be cached with the prediction label, if so, the prediction label of caching is made Teacher's mould is obtained if it is not, then requesting the prediction label to the tutor model server-side for the prediction label got The prediction label that type server-side returns, and the prediction label is cached.

12. device according to claim 11, which is characterized in that

The first acquisition unit is further used for, and generates the pretreatment identifier of described image, and the pretreatment identifier is used The pretreatment operation executed to described image is needed in instruction, and described image and the pretreatment identifier are sent to the religion Teacher's model service end, Xiang Suoshu tutor model server-side request the prediction label, obtain the tutor model server-side according to The pretreatment identifier carries out forward prediction generation to after described image execution pretreatment operation, based on pretreated image Prediction label.

13. device according to claim 12, which is characterized in that

Image in described image data queue is pretreated image；

The pretreated image first acquisition unit obtains after pre-processing to described image, alternatively, being The tutor model server-side is sent to the first acquisition unit after pre-processing to described image.

14. a kind of model distills realization device, which is characterized in that the model distillation realization device is applied to tutor model service In end, comprising: predicting unit and feedback unit；

The predicting unit, for carrying out forward prediction to the requested image for model training of tutor model client, Generate prediction label；

The feedback unit, for the prediction label to be returned to the tutor model client, so as to the tutor model Described image and the prediction label are stored in image data queue by client, are based on described image data for model trainer Content in queue carries out student model training.

15. device according to claim 14, which is characterized in that

The predicting unit is further used for, and obtains described image and pretreatment mark from the tutor model client Symbol, the pretreatment identifier are used to indicate the pretreatment operation for needing to execute to described image, are identified according to the pretreatment Symbol executes pretreatment operation to described image, carries out forward prediction based on pretreated image.

16. device according to claim 15, which is characterized in that

The feedback unit is further used for, and the pretreated image is returned to the tutor model client, so as to The pretreated image and the prediction label are stored in described image data queue by the tutor model client.

17. a kind of model distills realization device, which is characterized in that the model distillation realization device is applied to model trainer In, comprising: second acquisition unit and updating unit；

The second acquisition unit, for obtaining student model output；Student model output is student model from picture number Output knot after pulling image and corresponding prediction label in queue, after forward prediction is carried out using the image pulled Fruit；Preserved in described image data queue the deposit of tutor model client image for model training and corresponding prediction Label；The prediction label is to generate and send after tutor model server-side carries out forward prediction to described image to the teacher Model client end；

The updating unit, for calculating the error of the prediction label pulled and student model output, according to institute It states error and backpropagation is carried out to the student model, update the student model.

18. device according to claim 17, which is characterized in that

Image in described image data queue is pretreated image；

19. a kind of model distills realization system characterized by comprising the model as described in any one of claim 10-13 Realization device, the model distillation realization device as described in any one of claim 14-16 are distilled, and, such as claim 17- Model described in any one of 18 distills realization device.

20. a kind of computer equipment, including memory, processor and it is stored on the memory and can be on the processor The computer program of operation, which is characterized in that the processor is realized when executing described program as any in claim 1~9 Method described in.

21. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that described program is processed Such as method according to any one of claims 1 to 9 is realized when device executes.