CN112215248A

CN112215248A - Deep learning model training method, device, electronic device and storage medium

Info

Publication number: CN112215248A
Application number: CN201910625339.3A
Authority: CN
Inventors: 董师周; 张贺晔; 郑炜; 李烨
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2019-07-11
Filing date: 2019-07-11
Publication date: 2021-01-12
Anticipated expiration: 2039-07-11
Also published as: CN112215248B

Abstract

The present application proposes a deep learning model training method, device, electronic device, and storage medium, which relate to the field of computer technology and adopt a first training image set including multiple unlabeled images and a second training image including multiple labeled images. set, as the input of the deep learning model, and use the first loss function and the second loss function corresponding to the first training image set and the second training image set to update the model parameters of the deep learning model. Compared with the prior art, When the deep learning model is trained, it does not need to consider the characteristics of the model structure, but combines unsupervised learning and supervised learning to update the model parameters, so that the training method of the model is no longer limited to a specific model structure, which improves the generalization of model training. chemical.

Description

Deep learning model training method and device, electronic equipment and storage medium

Technical Field

The application relates to the technical field of computers, in particular to a deep learning model training method and device, electronic equipment and a storage medium.

Background

With the rise of deep learning, the performance of supervised learning becomes stronger and stronger, but the strong performance of the supervised learning needs to depend on massive manually labeled training data, and labeling massive image data is a task which needs to consume a large amount of manpower, material resources and time; on the other hand, unsupervised learning does not need data labeling, but because no real label is trained, only a few priori knowledge can be relied on to design an algorithm or artificially give a few labels with weak supervised information, and the performance of unsupervised learning is obviously different from that of supervised learning.

Semi-supervised learning trains the model by using image data with labels and image data without labels at the same time, so that the demand of the model on image data with labels can be effectively reduced; although the image data without the label cannot provide a real label for supervised learning, the image data without the label can reflect the data distribution of a real image and help the network to learn a good representation of the image data. The model can be helped to better learn the image recognition task by mining the features of the image data that are hidden in the data distribution information or better characterizing the image data.

However, currently, for a semi-supervised learning scheme of a deep learning model, such as a semi-supervised learning scheme based on a generation countermeasure network (GAN), a model training scheme is highly targeted, and is difficult to be applied to training of other models.

Disclosure of Invention

The application aims to provide a deep learning model training method and device, electronic equipment and a storage medium, and generalization of model training is improved.

In order to achieve the above purpose, the embodiments of the present application employ the following technical solutions:

in a first aspect, an embodiment of the present application provides a deep learning model training method, where the method includes:

obtaining a first training image set and a second training image set, wherein the first training image set comprises a plurality of unlabeled images, and the second training image set comprises a plurality of labeled images;

respectively taking the first training image set and the second training image set as the input of the deep learning model to obtain a first loss function and a second loss function, wherein the first loss function is a loss function when the first training image set is used as the input for training the deep learning model, and the second loss function is a loss function when the second training image set is used as the input for training the deep learning model;

and updating the model parameters of the deep learning model according to the first loss function and the second loss function.

In a second aspect, an embodiment of the present application provides a deep learning model training apparatus, where the apparatus includes:

a processing module, configured to obtain a first training image set and a second training image set, where the first training image set includes a plurality of unlabeled images, and the second training image set includes a plurality of labeled images;

the processing module is further configured to take the first training image set and the second training image set as inputs of the deep learning model respectively to obtain a first loss function and a second loss function, where the first loss function is a loss function when the first training image set is used as an input to train the deep learning model, and the second loss function is a loss function when the second training image set is used as an input to train the deep learning model;

and the updating module is used for updating the model parameters of the deep learning model according to the first loss function and the second loss function.

In a third aspect, an embodiment of the present application provides an electronic device, which includes a memory for storing one or more programs; a processor. The one or more programs, when executed by the processor, implement the deep learning model training method described above.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the deep learning model training method described above.

Compared with the prior art, when the deep learning model is trained, model parameters are updated by combining unsupervised learning and supervised learning instead of considering model structure characteristics, so that a training method of the model is not limited to a specific model structure, and generalization of model training is improved.

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and it will be apparent to those skilled in the art that other related drawings can be obtained from the drawings without inventive effort.

FIG. 1 is a schematic block diagram of a semi-supervised learning approach based on generation of a countermeasure network;

fig. 2 is a schematic structural block diagram of an electronic device provided in an embodiment of the present application;

FIG. 3 is a schematic flow chart of a deep learning model training method provided by an embodiment of the present application;

FIG. 4 is a schematic training scenario diagram of a deep learning model provided in an embodiment of the present application;

FIG. 5 is a schematic flow chart of the substeps of S203 in FIG. 3;

FIG. 6 is another schematic flow chart of the substeps of S203 in FIG. 3;

FIG. 7 is a schematic flow chart of the substeps of S205 of FIG. 3;

FIG. 8 is a schematic block diagram of a deep learning model training apparatus according to an embodiment of the present disclosure;

in the figure: 100-an electronic device; 101-a memory; 102-a processor; 103-a communication interface; 300-deep learning model training device; 301-a processing module; 302-update module.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

In the semi-supervised learning scheme based on the generative countermeasure network, for example, the main idea of the learning scheme is to learn the distribution (for example, including a labeled image and a non-labeled image) of the real input image data by using the generative countermeasure network, and then update the distribution information of the image data in the weight parameter of the discriminator by using the generative countermeasure network, and the discriminator shares the weight parameter with the classifier, and then transfers the distribution information of the image data to the classifier, thereby making the classifier better predict the category of the image.

Referring to fig. 1, fig. 1 is a schematic frame diagram of a semi-supervised learning method based on generation of a countermeasure network, where the semi-supervised learning frame based on generation of the countermeasure network mainly includes a generator, a discriminator and a classifier.

The generator consists of a multilayer deconvolution model and can generate a false image by using N-dimensional random noise sampled from Gaussian distribution; the discriminator comprises a multi-layer convolution model trained to identify false images from the true images and the false images generated by the generator; the classifier and the discriminator share the weight parameters and the network structure, and the classifier receives real image data (including labeled image data and unlabeled image data) and outputs the categories of the real image (assuming that K categories exist in the real image).

Through the countertraining of the generator and the discriminator, the false images generated by the generator are more and more close to the real images, and the discriminator can more accurately obtain the distribution of the real image data in order to more accurately distinguish the real images from the false images generated by the generator, and update the distribution information in the weight parameters of the discriminator, so that the classifier can more accurately predict the truth and the category of the images by sharing the weight parameters with the discriminator.

However, for the above training process, for example, for a semi-supervised learning scheme based on generation of a countermeasure network, due to the inherent structural characteristics of the generation of the countermeasure network, the free space of structural design of the classification network is limited, so that the model training scheme has strong pertinence, and if the structure or even the type of the model changes, the model training scheme is difficult to be applied to the training process of other models.

Based on the above defects, a possible implementation manner provided by the embodiment of the present application is; and updating model parameters of the deep learning model by using a first training image set containing a plurality of unlabeled images and a second training image set containing a plurality of labeled images as input of the deep learning model and using a first loss function and a second loss function which are obtained by the first training image set and the second training image set correspondingly.

Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.

Referring to fig. 2, fig. 2 is a schematic block diagram of an electronic device 100 according to an embodiment of the present disclosure. The electronic device 100 may be used as a device for training a deep learning model to implement the deep learning model training method provided in the embodiment of the present application, such as a mobile phone, a Personal Computer (PC), a tablet computer, a laptop computer, and the like.

The electronic device 100 includes a memory 101, a processor 102, and a communication interface 103, wherein the memory 101, the processor 102, and the communication interface 103 are electrically connected to each other directly or indirectly to enable data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines.

The memory 101 may be used to store software programs and modules, such as program instructions/modules corresponding to the deep learning model training apparatus 300 provided in the embodiments of the present application, and the processor 102 executes the software programs and modules stored in the memory 101, thereby executing various functional applications and data processing. The communication interface 103 may be used for communicating signaling or data with other node devices.

The Memory 101 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like.

The processor 102 may be an integrated circuit chip having signal processing capabilities. The Processor 102 may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

It will be appreciated that the configuration shown in FIG. 2 is merely illustrative and that electronic device 100 may include more or fewer components than shown in FIG. 2 or have a different configuration than shown in FIG. 2. The components shown in fig. 2 may be implemented in hardware, software, or a combination thereof.

The deep learning model training method provided in the embodiment of the present application is further described below by taking the electronic device 100 provided in fig. 2 as an exemplary execution subject.

Referring to fig. 3, fig. 3 is a schematic flowchart of a deep learning model training method according to an embodiment of the present application, including the following steps:

s201, a first training image set and a second training image set are obtained.

The first training image set comprises a plurality of unlabeled images, and the unlabeled images contained in the first training image set are used for the depth learning model to perform unsupervised learning; the second training image set comprises a plurality of labeled images, and the plurality of labeled images contained in the second training image set are used for the supervised learning of the deep learning model.

S203, the first training image set and the second training image set are respectively used as the input of the deep learning model to obtain a first loss function and a second loss function.

The first loss function is a loss function when the first training image set is used as input for training the deep learning model, and the second loss function is a loss function when the second training image set is used as input for training the deep learning model.

And S205, updating the model parameters of the deep learning model according to the first loss function and the second loss function.

In the embodiment of the application, the deep learning model is trained by adopting two training image sets, namely a first training image set and a second training image set.

When the deep learning model is trained, the obtained first training image set and the second training image set are respectively used as the input of the deep learning model to obtain a first loss function and a second loss function.

For example, referring to fig. 4, fig. 4 is a schematic training scene diagram of a deep learning model provided in an embodiment of the present application, where the deep learning model may be composed of multiple layers of convolution models, and can simultaneously receive input of two types of training pictures of a first training image set and a second training image set; the first training image set is a set of a plurality of unlabeled images, so that when the first training image set is used as the input of the deep learning model, the deep learning model performs unsupervised learning, and correspondingly, the first loss function is a loss function of the deep learning model under unsupervised learning; on the other hand, since the second training image set is a set of a plurality of labeled images, when the second training image set is used as the input of the deep learning model, the deep learning model performs supervised learning, and accordingly, the second loss function is a loss function of the deep learning model under the supervised learning.

It is worth to be noted that, in some possible application scenarios in the embodiments of the present application, the first training image set and the second training image set may be simultaneously used as inputs of the deep learning model, and then the first loss function and the second loss function are obtained after losses are respectively calculated; in addition, in some other possible application scenarios in the embodiment of the application, the first training image set may be used as the input of the deep learning model, so as to obtain a first loss function after calculating the loss, and then the second training image set is used as the input of the deep learning model, so as to obtain a second loss function after calculating the loss; the embodiment of the present application is not limited to this, the input order of the first training image set and the second training image set and the calculation order of the first loss function and the second loss function depend on a specific application scenario, and as long as the first training image set and the second training image set are respectively used as the input of the deep learning model and the losses are respectively calculated, the first loss function and the second loss function can be obtained.

Therefore, the model parameters of the deep learning model are updated according to the first loss function and the second loss function obtained by respectively training the deep learning model according to the first training image set and the second training image set, so that the model parameters are updated by combining unsupervised learning and supervised learning when the deep learning model is trained until the trained deep learning model converges.

Based on the above design, the deep learning model training method provided in the embodiment of the present application uses a first training image set including a plurality of unlabeled images and a second training image set including a plurality of labeled images as inputs of the deep learning model, and updates model parameters of the deep learning model by using a first loss function and a second loss function obtained by the first training image set and the second training image set, which correspond to each other.

It should be noted that, in general, tagged image data depends on manual tagging of a large number of untagged images, which requires a lot of manpower, material resources, and time.

Therefore, optionally, as a possible implementation manner, the number of unlabeled images included in the first training image set is greater than the number of labeled images included in the second training image set. Namely: in the embodiment of the application, the unsupervised training of massive unlabelled pictures is adopted, so that the deep learning model can carry out feature extraction on picture data, the image recognition task of the deep learning model can be conveniently learned, the use amount of the labeled picture data is reduced, the manpower and the physics which need to train the deep learning model are reduced, and the deep learning model is easier to train when a semi-supervised learning scheme is utilized.

In addition, to achieve the objective of obtaining the first loss function in S203, optionally, referring to fig. 5, fig. 5 is a schematic flow chart of the sub-step of S203 in fig. 3, as a possible implementation manner, S203 includes the following sub-steps:

s203-1, deleting image blocks from each picture in the first training image set to obtain a third training image set.

S203-2, the third training image set is used as the input of the deep learning model, and the restored training image set is obtained.

And restoring each picture in the training image set by the deep learning model according to the picture corresponding to the missing image block.

S203-3, calculating loss according to the restored training image set and the first training image set to obtain a first loss function.

When the first training image set is used for unsupervised learning of the deep learning model, a mode of missing image blocks is adopted for each picture in the first training image set, and then all pictures after the missing image blocks are gathered to obtain a third training image set.

It should be noted that, in general, the first training image set and the third training image set have the same number of pictures, and the pictures of the first training image set and the third training image set can be in one-to-one correspondence with each other, except that the pictures included in the third training image set are missing image blocks compared with the corresponding pictures in the first training image set.

Therefore, the third training image set is used as the input of the deep learning model, each picture in the third training image set is restored by the deep learning model, and all the restored pictures are collected to obtain a restored training image set.

It should be noted that, in general, the restored training image set and the first training image set have the same number of pictures, and the pictures of the restored training image set and the first training image set can be in one-to-one correspondence, except that the pictures included in the first training image set are original pictures, and the pictures corresponding to the restored training image set are restored from the pictures without the corresponding image blocks.

Obviously, it can be understood that, since the corresponding pictures in the training image set are restored and obtained through restoration of the corresponding pictures without image blocks, there may be a difference between the pictures included in the training image set and the original pictures corresponding to the first training image set, and the features of the extracted pictures can be learned by the deep learning model through the difference between the restored pictures and the original pictures.

Therefore, the loss is calculated according to the obtained reduction training image set and the first training image set, and then a first loss function of the deep learning model under unsupervised learning is obtained.

Based on the above design, in the deep learning model training method provided in the embodiment of the present application, after the images in the training image set lack the image blocks, the deep learning model restores the images without the image blocks, and then the restored images and the original images before the missing image blocks calculate the loss function, so that when the deep learning model performs self-supervised learning, features of the images are extracted for learning through a difference between the restored images and the original images before the missing image blocks, and the distribution information of the images in the training image set does not need to be learned, thereby avoiding the deterioration of the model performance due to inaccurate image data.

Optionally, as a possible implementation manner, in implementing S203-1, the manner of obtaining the third training image set may be:

and randomly missing image blocks with the set size of each picture in the first training image set, and collecting all pictures without the image blocks to obtain a third training image set.

It can be understood that the above-mentioned manner for implementing S203-1 is only one possible implementation manner, and that image blocks are randomly deleted according to a set size; in some other possible application scenarios according to the embodiment of the present application, the image set may also be deleted in other manners to obtain the third training image set, for example, the third training image set is obtained by deleting fixed image blocks through the set coordinate points.

On the other hand, to achieve the purpose of obtaining the second loss function in S203, optionally, referring to fig. 6, fig. 6 is another schematic flow chart of the sub-step of S203 in fig. 3, as another possible implementation manner, S203 includes the following sub-steps:

s203-6, the second training image set is used as the input of the deep learning model, and the training label corresponding to each picture in the second training image set is obtained.

S203-7, calculating loss according to the training label and the actual label corresponding to each picture in the second training image set to obtain a second loss function.

When the second training image set is used for supervised learning of the deep learning model, the second training image set is used as the input of the deep learning model, the deep learning model is used for predicting the label of each picture in the second training image set, and then the training label corresponding to each picture in the second training image set is obtained.

In the embodiment of the present application, the training labels predicted by the deep learning model for each picture may be different from the actual labels corresponding to each picture, and therefore, in the embodiment of the present application, the loss is calculated according to the training labels predicted by the deep learning model for each picture in the second training image set and the actual labels corresponding to each picture, so as to obtain the second loss function of the deep learning model under supervised learning.

To implement the above S205, optionally, referring to fig. 7, fig. 7 is a schematic flowchart of the sub-steps of S205 in fig. 3, and as a possible implementation, S205 includes the following sub-steps:

s205-1, the first loss function and the second loss function are weighted and summed with corresponding scaling ratios respectively to obtain a training loss sum.

And S205-2, updating the model parameters of the deep learning model according to the gradient obtained by the training loss sum to the model parameters of the deep learning model.

And when the first loss function and the second loss function are used for updating the model parameters of the deep learning model, respectively carrying out weighted summation on the first loss function and the second loss function according to the corresponding scaling ratios to obtain the training loss sum.

Therefore, the gradient of the model parameters of the deep learning model is obtained according to the training loss sum obtained by weighted summation, and the model parameters of the deep learning model are updated according to the obtained gradient.

The training loss sum comprises a part of a first loss function and a part of a second loss function, namely the training loss sum comprises learning information of a deep learning model under unsupervised learning and learning information of the deep learning model under supervised learning, so that the unsupervised learning and the supervised learning simultaneously optimize the deep learning model when model parameters of the deep learning model are updated and iterated, the two learning modes are mutually promoted, and the deep learning model learns the characteristics of the extracted pictures through a first training image set of unlabeled pictures, so that the model is assisted to recognize picture data, and the generalization of the model is improved.

Optionally, as a possible implementation manner, the scaling ratio of each of the first loss function and the second loss function is 1.

It is worth noting that, in some other possible implementation manners of the embodiment of the present application, a value may also be adopted as a scaling ratio corresponding to each of the first loss function and the second loss function, for example, the scaling ratio corresponding to the first loss function is 0.5, and the scaling ratio corresponding to the second loss function is 0.8, which depends on a specific application scenario, for example, in some other possible application scenarios of the embodiment of the present application, the scaling ratio corresponding to the first loss function may also be set to 0.6, and the scaling ratio corresponding to the second loss function may also be set to 0.9.

In addition, in some possible implementation manners of the embodiment of the present application, the scaling ratios corresponding to the first loss function and the second loss function may be obtained by pre-storing the deep learning model in a training device before training, or receiving an input of a user during training of the deep learning model in some other possible implementation manners of the embodiment of the present application.

Referring to fig. 8, based on the same inventive concept as the deep learning model training method provided in the foregoing embodiment of the present application, fig. 8 is a schematic structural diagram of a deep learning model training apparatus 300 provided in the embodiment of the present application, where the deep learning model training apparatus 300 includes a processing model 301 and an updating module 302.

The processing module 301 is configured to obtain a first training image set and a second training image set, where the first training image set includes a plurality of unlabeled images, and the second training image set includes a plurality of labeled images;

the processing module 301 is further configured to take the first training image set and the second training image set as inputs of the deep learning model, respectively, to obtain a first loss function and a second loss function, where the first loss function is a loss function when the first training image set is used as an input for training the deep learning model, and the second loss function is a loss function when the second training image set is used as an input for training the deep learning model;

the updating module 302 is configured to update the model parameters of the deep learning model according to the first loss function and the second loss function.

Optionally, as a possible implementation manner, when the processing module 301 takes the first training image set as an input of the deep learning model to obtain the first loss function, it is specifically configured to:

missing image blocks of each picture in the first training image set to obtain a third training image set;

taking the third training image set as the input of the deep learning model to obtain a restored training image set, wherein each picture in the restored training image set is restored by the deep learning model according to the picture corresponding to the missing image block;

and calculating loss according to the restored training image set and the first training image set to obtain a first loss function.

Optionally, as a possible implementation manner, when the processing module 301 deletes an image block from each picture in the first training image set to obtain a third training image set, the processing module is specifically configured to:

Optionally, as a possible implementation manner, the number of unlabeled images included in the first training image set is greater than the number of labeled images included in the second training image set.

Optionally, as a possible implementation manner, when the processing module 301 takes the second training image set as an input of the deep learning model to obtain the second loss function, the processing module is specifically configured to:

taking the second training image set as the input of the deep learning model to obtain a training label corresponding to each picture in the second training image set;

and calculating loss according to the training label and the actual label corresponding to each picture in the second training image set to obtain a second loss function.

Optionally, as a possible implementation manner, when the update module 301 updates the model parameters of the deep learning model according to the first loss function and the second loss function, it is specifically configured to:

respectively carrying out weighted summation on the first loss function and the second loss function according to corresponding scaling ratios to obtain a training loss sum;

and updating the model parameters of the deep learning model according to the gradient of the training loss sum to the model parameters of the deep learning model.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The apparatus embodiments described above are merely illustrative and, for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s).

It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.

It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, the functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: u disk, removable hard disk, read only memory, random access memory, magnetic or optical disk, etc. for storing program codes.

In summary, according to the deep learning model training method, the apparatus, the electronic device, and the storage medium provided in the embodiments of the present application, the first training image set including a plurality of unlabeled images and the second training image set including a plurality of labeled images are used as inputs of the deep learning model, and the model parameters of the deep learning model are updated by using the first loss function and the second loss function obtained by the first training image set and the second training image set, respectively.

And moreover, after the image blocks of the pictures in the training image set are deleted, the pictures without the image blocks are restored by the deep learning model, and then loss functions are calculated by the restored pictures and the original pictures before the image blocks are deleted, so that when the deep learning model performs self-supervision learning, the characteristics of the pictures are extracted for learning through the difference between the restored pictures and the original pictures before the image blocks are deleted, the distributed information of the images in the training image set does not need to be learned, and the phenomenon that the model performance is deteriorated due to inaccurate image data is avoided.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Claims

1. a deep learning model training method, is characterized in that, described method comprises:

obtaining a first training image set and a second training image set, wherein the first training image set includes multiple unlabeled images, and the second training image set includes multiple labeled images;

Using the first training image set and the second training image set as the input of the deep learning model, respectively, obtain a first loss function and a second loss function, wherein the first loss function is the first loss function The training image set is used as the loss function when training the deep learning model as input, and the second loss function is the loss function when the second training image set is used as the input to train the deep learning model;

According to the first loss function and the second loss function, the model parameters of the deep learning model are updated.

2. The method according to claim 1, wherein the step of obtaining the first loss function with the first training image set as the input of the deep learning model comprises:

Each picture in the first training image set is missing image blocks to obtain a third training image set;

Using the third training image set as the input of the deep learning model, a restored training image set is obtained, wherein each picture in the restored training image set is restored by the deep learning model according to the picture corresponding to the missing image block get;

Losses are calculated according to the restored training image set and the first training image set to obtain the first loss function.

3. The method according to claim 2, wherein the step of obtaining the third training image set by missing image blocks for each picture in the first training image set comprises:

Randomly missing image blocks with a set size of each picture in the first training image set, and collecting all the images after the missing image blocks to obtain the third training image set.

4. The method of claim 1, wherein the number of unlabeled images included in the first training image set is greater than the number of labeled images included in the second training image set.

5. The method of claim 1, wherein the step of obtaining a second loss function with the second training image set as the input of the deep learning model, comprises:

Using the second training image set as the input of the deep learning model, obtain the training label corresponding to each picture in the second training image set;

The loss is calculated according to the training label and the actual label corresponding to each picture in the second training image set, and the second loss function is obtained.

6. The method according to any one of claims 1-5, wherein, according to the first loss function and the second loss function, the step of updating the model parameters of the deep learning model comprises:

The first loss function and the second loss function are respectively weighted and summed at the corresponding scaling ratio to obtain the total training loss;

The model parameters of the deep learning model are updated according to the gradient obtained from the sum of the training losses to the model parameters of the deep learning model.

7 . The method of claim 6 , wherein scaling ratios corresponding to the first loss function and the second loss function are both 1. 8 .

8. A deep learning model training device, wherein the device comprises:

a processing module, configured to obtain a first training image set and a second training image set, wherein the first training image set includes multiple unlabeled images, and the second training image set includes multiple labeled images;

The processing module is further configured to use the first training image set and the second training image set as the input of the deep learning model, respectively, to obtain a first loss function and a second loss function, wherein the first loss function and the second loss function are obtained. A loss function is the loss function when the first training image set is used as input to train the deep learning model, and the second loss function is the loss function when the second training image set is used as input to train the deep learning model ;

An update module, configured to update model parameters of the deep learning model according to the first loss function and the second loss function.

9. An electronic device, characterized in that, comprising:

memory for storing one or more programs;

processor;

The method of any of claims 1-7 is implemented when the one or more programs are executed by the processor.

10. A computer-readable storage medium on which a computer program is stored, characterized in that, when the computer program is executed by a processor, the method according to any one of claims 1-7 is implemented.