CN112749800A - Neural network model training method, device and storage medium - Google Patents

Neural network model training method, device and storage medium Download PDF

Info

Publication number
CN112749800A
CN112749800A CN202110004383.XA CN202110004383A CN112749800A CN 112749800 A CN112749800 A CN 112749800A CN 202110004383 A CN202110004383 A CN 202110004383A CN 112749800 A CN112749800 A CN 112749800A
Authority
CN
China
Prior art keywords
training
network
neural network
network model
process models
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110004383.XA
Other languages
Chinese (zh)
Inventor
黄高
王朝飞
宋士吉
杨琪森
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202110004383.XA priority Critical patent/CN112749800A/en
Publication of CN112749800A publication Critical patent/CN112749800A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • G06Q50/205Education administration or guidance

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • Computational Linguistics (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Human Resources & Organizations (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Primary Health Care (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A neural network model training method, a device and a storage medium are provided, wherein the neural network model training method comprises the following steps: in the training process of the teacher network, the preselected teacher networks with different time nodes are respectively saved as process models; integrating the stored multiple process models to form a new teacher network; and training the student network by using the new teacher network.

Description

Neural network model training method, device and storage medium
Technical Field
The present disclosure relates to the field of neural network model compression, and in particular, to a neural network model training method, apparatus, and storage medium.
Background
Compared with a large deep neural network model, the performance of the lightweight neural network model is generally poor, and some applications with high performance requirements are difficult to meet. Model compression is the most common method of this problem, and generally includes methods of model pruning, parameter quantification, knowledge distillation, and the like.
Knowledge distillation is a concept proposed by Hinton in 2015, and aims to achieve the purpose of migrating the knowledge of a teacher network to a student network by introducing the knowledge of a pre-trained teacher network (generally a large network with superior performance and high complexity) as a part of a training loss function for constructing the student network (a lightweight network to be deployed at an application end with poor performance and low complexity). With respect to the method of knowledge distillation, over the years of development, many researchers have proposed various ways to represent the knowledge of the teacher's network, including methods of matching softened class labels (i.e., soft labels) of the teacher's network and the student's network, intermediate layer features, attention maps, relationships between instances or relationships between layers in the network structure, and the like. However, in these methods, the knowledge learned by the student network is only the knowledge of the trained teacher network, and does not include the knowledge in the training process of the teacher network itself, and the knowledge migration is not complete enough.
Disclosure of Invention
The following is a summary of the subject matter described in detail herein. This summary is not intended to limit the scope of the claims.
The embodiment of the application provides a neural network model training method which can expand the knowledge that a student network can learn.
The embodiment of the application provides a neural network model training method, which comprises the following steps:
in the training process of the teacher network, the preselected teacher networks with different time nodes are respectively saved as process models;
integrating the stored multiple process models to form a new teacher network;
and training a student network by using the new teacher network.
The embodiment of the application also provides a neural network model training device, which comprises a memory and a processor, wherein the memory is used for storing a program for training the neural network model; the processor is used for reading the program for training the neural network model and executing the neural network model training method.
Embodiments of the present application further provide a computer-readable storage medium storing computer-executable instructions for performing the neural network model training method described above.
The technical scheme of the embodiment of the application not only can enable students to learn the knowledge after the teacher network is solidified, but also can learn the process experience of the teacher network during learning, so that the knowledge transfer is more complete, and the students can learn richer and more generalized knowledge through the network.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. Other advantages of the present application may be realized and attained by the instrumentalities and combinations particularly pointed out in the specification and the drawings.
Other aspects will be apparent upon reading and understanding the attached drawings and detailed description.
Drawings
The accompanying drawings are included to provide an understanding of the present disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the examples serve to explain the principles of the disclosure and not to limit the disclosure.
FIG. 1 is a flow chart of a neural network model training method in an embodiment of the present application;
FIG. 2 is a second flowchart of a neural network model training method according to an embodiment of the present application;
FIG. 3 is a schematic diagram of an embodiment of a neural network model training apparatus;
fig. 4 is a second schematic diagram of the neural network model training apparatus in the embodiment of the present application.
Detailed Description
The present application describes embodiments, but the description is illustrative rather than limiting and it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible within the scope of the embodiments described herein. Although many possible combinations of features are shown in the drawings and discussed in the detailed description, many other combinations of the disclosed features are possible. Any feature or element of any embodiment may be used in combination with or instead of any other feature or element in any other embodiment, unless expressly limited otherwise.
The present application includes and contemplates combinations of features and elements known to those of ordinary skill in the art. The embodiments, features and elements disclosed in this application may also be combined with any conventional features or elements to form a unique inventive concept as defined by the claims. Any feature or element of any embodiment may also be combined with features or elements from other inventive aspects to form yet another unique inventive aspect, as defined by the claims. Thus, it should be understood that any of the features shown and/or discussed in this application may be implemented alone or in any suitable combination. Accordingly, the embodiments are not limited except as by the appended claims and their equivalents. Furthermore, various modifications and changes may be made within the scope of the appended claims.
Further, in describing representative embodiments, the specification may have presented the method and/or process as a particular sequence of steps. However, to the extent that the method or process does not rely on the particular order of steps set forth herein, the method or process should not be limited to the particular sequence of steps described. Other orders of steps are possible as will be understood by those of ordinary skill in the art. Therefore, the particular order of the steps set forth in the specification should not be construed as limitations on the claims. Further, the claims directed to the method and/or process should not be limited to the performance of their steps in the order written, and one skilled in the art can readily appreciate that the sequences may be varied and still remain within the spirit and scope of the embodiments of the present application.
As shown in fig. 1, an embodiment of the present application provides a neural network model training method, including:
s100, in the training process of the teacher network, the teacher networks of different time nodes selected in advance are respectively stored as process models;
s101, integrating a plurality of stored process models to form a new teacher network;
s102, the new teacher network is used for training the student network.
In step S100, the teacher network may be selected from any type of neural network model, and a high-performance and high-complexity model is generally selected as the teacher network; the selection of different training time nodes can adopt a manual selection method or an automatic selection method, can be uniform selection or non-uniform selection, and can select a process model according to needs.
In step S101, the stored multiple process models are integrated into a new teacher network, and there are multiple ways for integrating or integrating the multiple process models, which is not limited in this embodiment of the present application.
In step S102, the selection of the student network may be any type of neural network model, and the student network is generally a model with limited performance and low complexity.
The inventor of the application keenly finds in practice that the existing knowledge distillation methods are all based on learning the knowledge of the trained teacher network, and do not utilize the knowledge in the self-training process of the teacher network. If analogize with the reality education scene, the teacher only teaches the student with the knowledge that has solidified after oneself learns, does not teach the student with the knowledge of teacher in the learning process, actually, the process experience is often more important than the result knowledge.
In the scheme of the embodiment of the application, models in the teacher network training process are integrated, the process models reflect the learning process of a teacher network, the learning experience of the teacher network is included, the process models are integrated into a new teacher network, the new teacher network is used for training the student network, the student network can learn not only the knowledge after the teacher network is solidified, but also the process experience of the teacher network in learning is included, the knowledge migration is more complete, and the student network can learn more abundant and more generalized knowledge.
In some technologies, the application of deep neural networks has been developed from large devices to edge devices such as embedded devices and mobile devices. The equipment has the characteristics of limited computing capability, limited storage space and the like, and is difficult to deploy a large-scale deep neural network. The performance of the student network can be generally improved to a certain extent by a knowledge distillation method, so that the deployment effect of the lightweight network on the edge equipment is improved. In such an application scenario, the scheme in the embodiment of the application is beneficial to further improving the deployment effect of the lightweight network on the edge device.
In an exemplary embodiment, training a student network with a new teacher network includes:
performing knowledge distillation between the new teacher network and the new student network according to the selected knowledge distillation method, and constructing a training loss function corresponding to the knowledge distillation method;
and training the student network by using the loss function.
The knowledge of a new teacher network can be transferred to a student network in a knowledge distillation mode, and the method for distilling the knowledge is not limited in the embodiment of the application.
In an exemplary embodiment, integrating the stored plurality of process models includes:
assigning weight values omega to stored process modelsjAnd according to the assigned weight value omegajIntegrating a plurality of process models;
wherein, the weight values omega corresponding to the process models respectivelyjThe sum is 1.
In an exemplary embodiment, the plurality of process models are assigned weight values ω respectivelyjThe output of each process model may then be multiplied by a respective weight value and summed, with the resulting sum being the output of the new teacher network. However, the embodiment of the present application does not limit how the weight value is used.
In an exemplary embodiment, weights are assigned to the stored plurality of process models, respectivelyWeight value omegajThe method comprises any one of the following modes:
the weight values are respectively preset for the process models, or the process models obtain the respective weight values in a training and autonomous learning mode.
For weight value omegajThe weight value ω of each process model may be set by uniformly distributing the weight values ω to a plurality of process models, for example, a total of n process models are selectedj1/n; the weight value omega assigned to each process model can also be adjusted according to the actual effectjAre different values; the respective weight values can also be obtained by a plurality of process models through a training self-learning mode. The embodiment of the present application does not limit the setting manner of the weight value.
In an exemplary embodiment, the weight value ω is assignedjIntegrating the plurality of process models, comprising:
for arbitrary input sample xiIs provided with
Figure BDA0002882910660000051
Wherein, T'θA new network of teachers is represented,
Figure BDA0002882910660000052
representing the process model and n representing the number of process models.
In an exemplary embodiment, a network of teachers at different preselected time nodes includes:
teacher network at different epoch time selected according to preset interval.
The network models at n moments in the training process can be selected and stored according to actual needs (such as calculation overhead, running speed and the like), for example, the complete training process contains 200 epochs, and the 10 models generated at 20 th, 40 th, 60 th, 80 th, 100 th, 120 th, 140 th, 160 th, 180 th and 200 th epoch moments can be uniformly selected
Figure BDA0002882910660000061
The predetermined interval canTo be the time interval between epochs; or the number interval of the epochs, namely selecting several epochs at intervals; the same interval selection model can be set, and different intervals can also be set for selection, and the meaning of the intervals and the setting form of the intervals are not limited in the embodiment of the application.
In an exemplary embodiment, the knowledge distillation method employed comprises: and matching softened classification labels (namely soft labels (KD)) of the teacher network and the student network, an attention map (AT), middle layer features (hits), the relation between the instances or the relation between layers in the network structure, and the like.
In an exemplary embodiment, knowledge distillation is performed according to a selected knowledge distillation method, and a training loss function corresponding to the knowledge distillation method is constructed, including:
when a soft label knowledge distillation method is selected, when a student network is trained, a soft label loss item is added on the basis of a cross entropy loss function, and the formula of the training loss function is as follows:
Figure BDA0002882910660000062
wherein the content of the first and second substances,
Figure BDA0002882910660000063
a function representing the loss of training is represented,
Figure BDA0002882910660000064
represents the cross entropy loss function, σ (-) represents the softmax function, y represents the true class label, zs and zt represent the logits outputs of the student network and the new teacher network, respectively, T represents the temperature parameter, and α represents the loss term adjustment coefficient.
As shown in fig. 2, the following describes the scheme of the embodiment of the present application, taking training of a student network by using a knowledge distillation method as an example:
first, a teacher network T is trained on a known data setθGenerally, a larger network is selected, such as ResNet50, DenseNet110, etc.; according toSelecting and storing the network models at n epoch moments in the training process according to actual needs (such as calculation overhead, running speed and the like);
secondly, the n process models are integrated to obtain a stronger teacher network T'θFor example, a weight value ω may be assigned to each of the stored process modelsjAnd according to the assigned weight value omegajIntegrating multiple process models, and sampling any input sample xiIs provided with
Figure BDA0002882910660000071
Wherein ω isjIs a weight, and ∑jωj=1;
Thirdly, selecting the student network to be deployed
Figure BDA0002882910660000072
Typically small networks such as ShuffleNet v2, MobileNet v2, and the like; from T'θAs a network of teachers, the network of teachers,
Figure BDA0002882910660000073
as a student network, the knowledge distillation is carried out by adopting a traditional distillation method, and the training loss function of the student network is designed
Figure BDA0002882910660000074
Training student networks using constructed loss functions
Figure BDA0002882910660000075
Finally, the trained student network can be paired
Figure BDA0002882910660000076
And carrying out testing and deployment.
Compared with other methods for training the student network based on the knowledge distillation method, the method provided by the embodiment of the application not only limits the knowledge that the student network can learn after the teacher network is solidified, but also comprises the experience of the learning process of the teacher network, so that the knowledge that the student network can learn can be expanded, and the student network has stronger generalization.
As shown in fig. 3, an embodiment of the present application provides a neural network model training apparatus, including a memory and a processor, where the memory is used to store a program for performing neural network model training; the processor is used for reading a program for training the neural network model and can execute the neural network model training method in the embodiment.
As shown in fig. 4, an embodiment of the present application further provides a neural network model training apparatus, including:
the teacher network training module is internally provided with a linear classifier, a cosine distance classifier and a cross entropy loss function, has a process model storage function, and can respectively store the preselected teacher networks with different time nodes as process models according to setting in the training process of the teacher network;
the process model integration module is internally provided with an integration algorithm and is used for integrating a plurality of stored process models to form a new teacher network;
and the student network training module is internally provided with the setting of training parameters and is set to train the student network by utilizing a new teacher network.
In an exemplary embodiment, the neural network model training apparatus further includes:
and the loss function reconstruction module is internally provided with a plurality of knowledge distillation modes, and can automatically reconstruct the loss function of the student network according to the corresponding knowledge distillation modes.
In an exemplary embodiment, the process model integration module is further configured to include a selection mode for integrating weight parameters of different process models, and to assign weight values ω to the stored process modelsjAnd according to the assigned weight value omegajIntegrating a plurality of process models;
wherein, the weight values omega corresponding to the process models respectivelyjThe sum is 1.
In an exemplary embodiment, the selection pattern of the different process models integrating the weight parameters may include:
uniformly selecting a mode: a pattern of weighted values assigned to a plurality of process models on average;
manually setting the mode: respectively presetting weighted value modes for the process models;
an autonomous learning mode: the multiple process models obtain respective patterns of weight values by training an autonomous learning manner.
In an exemplary embodiment, the teacher web training module is further configured to: teacher networks at different epoch times are selected at preselected intervals.
In an exemplary embodiment, the plurality of knowledge distillation methods built in the loss function reconstruction module include: the softened class labels (i.e., soft labels (KD)), attention maps (AT), middle layer features (hits), instances-to-instances relationships, or layer-to-layer relationships in the network structure, etc., of the teacher network and the student network are matched.
In an exemplary embodiment, the neural network model training apparatus further includes:
the data set preprocessing module is internally provided with a data preprocessing function and can be used for cleaning and standardizing the training data set, increasing samples, inputting a shuffle input sequence, dividing the mini-batch into different sizes and the like.
In an exemplary embodiment, the neural network model training apparatus further includes:
the teacher network selection module is internally provided with a plurality of network structures with higher complexity, and a user can select different teacher networks.
In an exemplary embodiment, the neural network model training apparatus further includes:
the student network selection module is internally provided with various lightweight network structures, and a user can select different student networks to perform learning training.
In an exemplary embodiment, the student network training module is further configured to train the student network in conjunction with the reconstructed loss function.
In an exemplary embodiment, the neural network model training apparatus further includes:
and the testing and deploying module is used for testing the network model obtained by training, storing the model after the test is finished and passed and implementing deployment.
The neural network model training device can implement the neural network model training method in any of the above embodiments, and details of implementation are not repeated herein. The module setting of the neural network model training device can be reduced or increased according to the actual application scene, and the embodiment of the application does not limit the module setting.
The embodiment of the present application further provides a computer-readable storage medium, which stores computer-executable instructions for executing the neural network model training method in the foregoing embodiment.
The following describes a neural network model training method in the embodiment of the present application with an example one:
example 1
Taking an application scene for image classification as an example, a reference data set selects CIFAR-100, a training set and a test set are divided in a standard dividing mode and respectively comprise 50000 pictures and 10000 pictures, an evaluation criterion adopts top-1 accuracy, a teacher network selects ResNet18, ResNet50 and DenseNet121, a student network selects Mobilenetv2 and Shufflentv 2, a classifier adopts a linear classifier, and a sample enhancement strategy adopts random clipping and horizontal turning.
The main parameters are set as follows:
the batch size is selected to be 128, the iteration number is 200, the optimizer selects Adam, the coding tool is Pythrch, and the Titan Xp video card is adopted for model training. The time for selecting and storing the process models is integral multiple epoch of 20, namely 10 process models are stored in one teacher network training, the average value 1/10 is selected by integrating weights, the temperature T is 4, and the loss term coefficient alpha is 0.5.
The results of the experimental comparison on the CIFAR-100 dataset are shown in the following table, wherein TEAcher represents a Teacher network, Student represents a Student network, Baseline represents the result of training the Student network independently without introducing the TEAcher network (i.e., without using a knowledge distillation method), KD represents the result of outputting a supervised training single Student by using a traditional single TEAcher soft label, and Ours represents the result obtained by using the method of the embodiment of the application.
Figure BDA0002882910660000101
As can be seen from the table, compared with a method for training a student network alone, the method provided by the embodiment of the application can be improved by 4.68% on average, and compared with a traditional KD method, the method can be improved by 2.41% on average, so that the effectiveness and the generalization of the method provided by the embodiment of the application are fully demonstrated.
It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.

Claims (10)

1. A neural network model training method is characterized by comprising the following steps:
in the training process of the teacher network, the preselected teacher networks with different time nodes are respectively saved as process models;
integrating the stored multiple process models to form a new teacher network;
and training a student network by using the new teacher network.
2. The neural network model training method of claim 1, wherein said training a student network with the new teacher network comprises:
performing knowledge distillation between the new teacher network and the new student network according to the selected knowledge distillation method, and constructing a training loss function corresponding to the knowledge distillation method;
and training the student network by using the loss function.
3. The neural network model training method of claim 1, wherein the integrating the stored plurality of process models comprises:
respectively assigning weight values omega to the stored multiple process modelsjAnd according to the assigned weight value omegajIntegrating the plurality of process models;
wherein the process models respectively correspond to weight values omegajThe sum is 1.
4. The neural network model training method of claim 3, wherein the weight values ω are respectively assigned to the stored process modelsjThe method comprises any one of the following modes:
and respectively presetting weight values for the process models, or obtaining respective weight values for the process models in a training and autonomous learning mode.
5. The neural network model training method of claim 3, wherein the weight value ω is assignedjIntegrating the plurality of process models, comprising:
for arbitrary input sample xiIs provided with
Figure FDA0002882910650000011
Wherein, T'θA new network of teachers is represented,
Figure FDA0002882910650000012
representing the process model and n representing the number of process models.
6. The neural network model training method of claim 1, wherein the network of teachers at different preselected time nodes comprises:
teacher network at different epoch time selected according to preset interval.
7. The neural network model training method of claim 2, wherein the knowledge distillation method comprises: matching softened class labels, attention maps, middle tier features, instance to instance relationships, or layer to layer relationships in a network structure for the new teacher network and the student networks.
8. The neural network model training method of claim 7, wherein the knowledge distillation is performed according to the selected knowledge distillation method, and the construction of the training loss function corresponding to the knowledge distillation method comprises:
when the soft label knowledge distillation method is selected, the formula of the training loss function is as follows:
Figure FDA0002882910650000021
wherein the content of the first and second substances,
Figure FDA0002882910650000022
a function representing the loss of training is represented,
Figure FDA0002882910650000023
represents the cross entropy loss function, σ (-) represents the softmax function, y represents the true class label, zs and zt represent the logits outputs of the student network and the new teacher network, respectively, T represents the temperature parameter, and α represents the loss term adjustment coefficient.
9. The neural network model training device comprises a memory and a processor, and is characterized in that the memory is used for storing a program for training a neural network model; the processor is used for reading the program for training the neural network model and executing the neural network model training method according to any one of claims 1 to 8.
10. A computer-readable storage medium storing computer-executable instructions for performing the neural network model training method of any one of claims 1-8.
CN202110004383.XA 2021-01-04 2021-01-04 Neural network model training method, device and storage medium Pending CN112749800A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110004383.XA CN112749800A (en) 2021-01-04 2021-01-04 Neural network model training method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110004383.XA CN112749800A (en) 2021-01-04 2021-01-04 Neural network model training method, device and storage medium

Publications (1)

Publication Number Publication Date
CN112749800A true CN112749800A (en) 2021-05-04

Family

ID=75649886

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110004383.XA Pending CN112749800A (en) 2021-01-04 2021-01-04 Neural network model training method, device and storage medium

Country Status (1)

Country Link
CN (1) CN112749800A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113920574A (en) * 2021-12-15 2022-01-11 深圳市视美泰技术股份有限公司 Training method and device for picture quality evaluation model, computer equipment and medium
CN114118061A (en) * 2021-11-30 2022-03-01 深圳市北科瑞声科技股份有限公司 Lightweight intention recognition model training method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114118061A (en) * 2021-11-30 2022-03-01 深圳市北科瑞声科技股份有限公司 Lightweight intention recognition model training method, device, equipment and storage medium
CN113920574A (en) * 2021-12-15 2022-01-11 深圳市视美泰技术股份有限公司 Training method and device for picture quality evaluation model, computer equipment and medium

Similar Documents

Publication Publication Date Title
CN109558942B (en) Neural network migration method based on shallow learning
Jang et al. Learning what and where to transfer
CN109711481B (en) Neural networks for drawing multi-label recognition, related methods, media and devices
CN109816009A (en) Multi-tag image classification method, device and equipment based on picture scroll product
Taylor et al. Learning invariance through imitation
CN110135562B (en) Distillation learning method, system and device based on characteristic space change
CN111325664B (en) Style migration method and device, storage medium and electronic equipment
CN112749800A (en) Neural network model training method, device and storage medium
CN113139628B (en) Sample image identification method, device and equipment and readable storage medium
CN112532746B (en) Cloud edge cooperative sensing method and system
CN110555399A (en) Finger vein identification method and device, computer equipment and readable storage medium
CN110866564B (en) Season classification method, system, electronic device and medium for multiple semi-supervised images
JP2017084340A (en) Tag processing method and tag processing device
CN111916144B (en) Protein classification method based on self-attention neural network and coarsening algorithm
CN109784159A (en) The processing method of scene image, apparatus and system
CN108228674A (en) A kind of information processing method and device based on DKT
CN109447096A (en) A kind of pan path prediction technique and device based on machine learning
CN113807499A (en) Lightweight neural network model training method, system, device and storage medium
CN114298224B (en) Image classification method, apparatus and computer readable storage medium
CN116363560A (en) Video mask self-coding method and system
CN107271367A (en) A kind of identifying water boy method and device
CN111222399A (en) Method and device for identifying object identification information in image and storage medium
CN114330554A (en) Intelligent security oriented visual depth model knowledge recombination method
CN113762331A (en) Relational self-distillation method, apparatus and system, and storage medium
CN110866866B (en) Image color imitation processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210504