CN112101551A - Method, apparatus, device and storage medium for training a model - Google Patents

Method, apparatus, device and storage medium for training a model Download PDF

Info

Publication number
CN112101551A
CN112101551A CN202011027431.9A CN202011027431A CN112101551A CN 112101551 A CN112101551 A CN 112101551A CN 202011027431 A CN202011027431 A CN 202011027431A CN 112101551 A CN112101551 A CN 112101551A
Authority
CN
China
Prior art keywords
model
trained
training data
feature
feature set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011027431.9A
Other languages
Chinese (zh)
Inventor
杨馥魁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202011027431.9A priority Critical patent/CN112101551A/en
Publication of CN112101551A publication Critical patent/CN112101551A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a method, a device, equipment and a storage medium for training a model, and relates to the field of artificial intelligence, in particular to the technical field of computer vision and deep learning. The specific implementation scheme is as follows: acquiring a trained model, a model to be trained and a training data set; extracting first features of each training data in the training data set by using the trained model and extracting second features of each training data in the training data set by using the model to be trained; and adjusting the model parameters and the weight of the model to be trained according to the obtained first characteristic set and the second characteristic set so as to train the model to be trained. The method can improve the efficiency and accuracy of model training, the trained model can accurately process images or classify data, the small and simple model can achieve the same processing effect of a large and complex model, the training process is simplified, and the efficiency of image processing or data processing is improved.

Description

Method, apparatus, device and storage medium for training a model
Technical Field
The present application relates to the field of artificial intelligence, and in particular, to the field of computer vision and deep learning technologies, and more particularly, to a method, an apparatus, a device, and a storage medium for training a model.
Background
The distillation technology is a very common model compression technology, the problem that the distillation weight needs to be manually adjusted usually exists when the characteristic distillation is adopted in the existing distillation technology, the influence of the distillation weight on the final effect is very large, and the problems of low efficiency, poor effect and the like exist when the distillation weight is manually adjusted.
Disclosure of Invention
The present disclosure provides a method, apparatus, device, and storage medium for training a model.
According to an aspect of the present disclosure, there is provided a method for training a model, comprising: acquiring a trained model, a model to be trained and a training data set; extracting first features of each training data in the training data set by using the trained model and extracting second features of each training data in the training data set by using the model to be trained; and adjusting the model parameters and the weight of the model to be trained according to the obtained first characteristic set and the second characteristic set so as to train the model to be trained.
According to another aspect of the present disclosure, there is provided an apparatus for training a model, comprising: an obtaining unit configured to obtain a trained model, a model to be trained, and a training data set; the characteristic extraction unit is used for extracting first characteristics of each training data in the training data set by using the trained model and extracting second characteristics of each training data in the training data set by using the model to be trained; and the model training unit is configured to adjust the model parameters and the weights of the model to be trained according to the obtained first feature set and the second feature set so as to train the model to be trained.
According to yet another aspect of the present disclosure, there is provided an electronic device for training a model, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method for training a model as described above.
According to yet another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method for training a model as described above.
According to the technology of the application, the problems of low efficiency and poor effect of manual distillation weight adjustment are solved, the weight and model parameters can be adjusted in a self-adaptive mode through the first feature set and the second feature set of each training data in the training data set extracted by the trained model and the model to be trained respectively, the training efficiency and accuracy of the model can be improved, the trained model can accurately process images or classify data, the same processing effect of a large and complex model can be achieved by a small and simple model, the training process is simplified, and the image processing or data processing efficiency is improved.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:
FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present application may be applied;
FIG. 2 is a flow diagram of one embodiment of a method for training a model according to the present application;
FIG. 3 is a schematic diagram of an application scenario of a method for training a model according to the present application;
FIG. 4 is a flow diagram of another embodiment of a method for training a model according to the present application;
FIG. 5 is a schematic diagram of an embodiment of an apparatus for training a model according to the present application;
FIG. 6 is a block diagram of an electronic device for implementing a method for training a model according to an embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of the present method for training a model or apparatus for training a model may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various communication client applications, such as model training applications, may be installed on the terminal devices 101, 102, 103.
The terminal apparatuses 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, smart phones, tablet computers, e-book readers, car computers, laptop portable computers, desktop computers, and the like. When the terminal apparatuses 101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules, or as a single piece of software or software module. And is not particularly limited herein.
The server 105 may be a server providing various services, such as a background server that trains the model to be trained using the training data sets collected on the terminal devices 101, 102, 103. The background server can obtain a training data set, respectively extract the characteristics of each training data in the training data set by using the trained model and the model to be trained, obtain a characteristic set, and adjust the model parameters and the weight of the model to be trained according to the obtained characteristic set so as to train the model to be trained.
The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 105 is software, it may be implemented as a plurality of software or software modules, or as a single software or software module. And is not particularly limited herein.
It should be noted that the method for training the model provided in the embodiment of the present application may be executed by the terminal device 101, 102, 103 or the server 105. Accordingly, the means for training the model is typically provided in the terminal device 101, 102, 103 or the server 105.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to FIG. 2, a flow 200 of one embodiment of a method for training a model according to the present application is shown. The method for training the model of the embodiment comprises the following steps:
step 201, a trained model, a model to be trained, and a training data set are obtained.
In this embodiment, an executing entity (for example, the server 105 in fig. 1) of the method for training the model may obtain the training data set collected by the terminal device through a wired connection or a wireless connection, and call the trained model and the model to be trained local to the server or on the terminal device. The trained model can be a trained neural network model which can accurately identify or classify the image to be identified or the data to be classified so as to supervise the training of the model to be trained. The model to be trained may be an initial neural network model without any training or a partially trained intermediate state neural network model, which is not specifically limited in this application. The training data set may be a certain number of pictures to be recognized, for example, the training data set may be a certain number of face pictures for face recognition; or a certain amount of data to be classified, for example, the training data set may be a certain number of Chinese and English documents, and the Chinese and English in the documents are classified.
Step 202, extracting a first feature of each training data in the training data set by using the trained model and extracting a second feature of each training data in the training data set by using the model to be trained.
After obtaining the trained model, the model to be trained, and the training data set, the executing agent may extract a first feature of each training data in the training data set by using the trained model and extract a second feature of each training data in the training data set by using the model to be trained. For example, when the training data set is a set of face pictures to be recognized, each piece of training data in the training data set may be each face picture, the first feature may be a feature point of each face picture extracted by the trained model, and the second feature may be a feature point of each face picture extracted by the model to be trained. And obtaining a first feature set by each first feature extracted from the trained model, and obtaining a second feature set by each second feature extracted from the model to be trained. The first feature and the second feature may be values converted from the extracted feature points, or may be feature vectors converted from the extracted feature points, and the specific expression form of the features is not limited in the present application.
Step 203, adjusting model parameters and weights of the model to be trained according to the obtained first feature set and second feature set so as to train the model to be trained.
After the execution subject obtains the first feature set and the second feature set, the model parameters and the weights of the model to be trained can be adjusted according to the obtained first feature set and the second feature set so as to train the model to be trained. Specifically, the first feature set may be a feature set of each training data in a training data set from which the trained model is accurately extracted by identifying feature points. For example, when the training data set is a set of face pictures to be recognized, each piece of training data in the training data set may be each face picture, and the first feature set may be a set of face feature points of each face picture accurately extracted by the trained model. The second feature set may be a feature set of each training data in a training data set extracted by the model to be trained through the recognition feature points. For example, when the training data set is a set of face pictures to be recognized, the second feature set may be a set of face feature points of each picture extracted by the model to be trained. The execution subject may calculate similarities of the first feature and the second feature corresponding to the same training data in the first feature set and the second feature set, may then calculate an average value of the obtained similarities to obtain an average value of the similarities, and may determine the obtained average value of the similarities as the initial weight. The execution main body can pass back and iteratively adjust the model parameters and the weights of the model to be trained according to the initial weights, the first feature set and the second feature set in a gradient mode to train the model to be trained, and iteration is stopped until the iteration times exceed a preset threshold value, so that a trained target model is obtained.
With continued reference to FIG. 3, a schematic diagram of one application scenario of a method for training a model according to the present application is shown. In the application scenario of fig. 3, a server 301 obtains a trained model 303, a model to be trained 304, and a training data set 302. Server 301 extracts a first feature of each training data in training data set 302 using trained model 303 and a second feature of each training data in training data set 302 using model to be trained 304. The server 301 adjusts the model parameters 307 and the weights 308 of the model 304 to be trained according to the obtained first feature set 305 and the second feature set 306, so as to train the model 304 to be trained.
In the embodiment, the weight and the model parameters can be adaptively adjusted through the first feature set and the second feature set of each training data in the training data set respectively extracted by the trained model and the model to be trained, so that the training efficiency and the training accuracy of the model can be improved, the trained model can accurately process images or classify data, a small and simple model can achieve the same processing effect of a large and complex model, the training process is simplified, and the image processing or data processing efficiency is improved.
With continued reference to FIG. 4, a flow 400 of another embodiment of a method for training a model according to the present application is shown. As shown in fig. 4, the method for training a model of the present embodiment may include the following steps:
step 401, obtaining a trained model, a model to be trained, and a training data set.
Step 402, extracting a first feature of each training data in the training data set by using the trained model and extracting a second feature of each training data in the training data set by using the model to be trained.
And 403, adjusting model parameters and weights of the model to be trained according to the obtained first feature set and second feature set so as to train the model to be trained.
The principle of step 401 to step 403 is similar to that of step 201 to step 203, and is not described herein again.
Specifically, step 403 can be implemented by steps 4031 to 4032:
step 4031, for each training data in the training data set, calculate the similarity between the first feature and the second feature of the training data.
After the execution subject obtains the training data set, the first feature set and the second feature set, for each training data in the training data set, the similarity between the first feature and the second feature of the training data is calculated. Specifically, the executing agent may calculate the cosine similarity for the first feature extracted from the trained model and the second feature extracted from the model to be trained corresponding to each piece of training data. For example, extracting a first feature and a second feature from a picture 1, a picture 2, and a picture 3, respectively, where the first feature is denoted as feature 1, feature 2, and feature 3; the second features are denoted as feature 1 ', feature 2', and feature 3 ', and the execution subject may calculate the similarity of feature 1 and feature 1' corresponding to picture 1, the similarity of feature 2 and feature 2 'corresponding to picture 2, and the similarity of feature 3 and feature 3' corresponding to picture 3, respectively. It is understood that to perform the similarity calculation, each feature needs to be converted into a feature vector first, and specifically, each feature may be converted into a feature vector through a pre-trained conversion model. For example, the pre-trained conversion model may be, for example, a Bidirectional Encoder characterization from transform (BERT) model from a transformer, and the pre-trained conversion model is not particularly limited in this application.
Step 4032, according to the similarity corresponding to each training data, the first feature set and the second feature set, the model parameters and the weights of the model to be trained are adjusted.
After the execution subject obtains the similarity between the first feature and the second feature of each training data in the training data set, the model parameters and the weights of the model to be trained can be adjusted according to the similarity, the first feature set and the second feature set corresponding to each training data. Specifically, the executing entity may adjust the corresponding model parameter and weight by making the similarity corresponding to each training data approach a preset value, such as 1. After the execution subject adjusts the model parameters and the weights, the second feature set of each training data in the training data set extracted by the model to be trained can be updated, and the model parameters and the weights of the model to be trained can be iteratively adjusted according to the first feature set, the updated second feature set and the similarity gradient determined by the first feature set and the updated second feature set. The weight may be a weighted term in a loss function for supervising the model to be trained to train, the weight may be a similarity corresponding to each training data, the weight undergoes a process from increasing to no longer changing in the training process of the model to be trained, and when the weight is no longer changing, the weight is used as a constant weighted term of the loss function for supervising the model to be trained to train, and the training of the model to be trained can be continued to be supervised.
According to the embodiment, the model parameters and the weights of the model to be trained are iteratively trained and adaptively adjusted according to the similarity, the first feature set and the second feature set corresponding to each training data, so that the training efficiency and the training effect of the model can be improved, and the image or data can be efficiently processed by using the trained model.
Specifically, step 4032 can be implemented by steps 40321 to 40324:
step 40321, determine initial weights according to the similarity corresponding to each training data.
After obtaining the similarity corresponding to each training data, the execution subject may determine the initial weight according to the similarity corresponding to each training data. Specifically, the executive agent may respectively use the similarity corresponding to each training data as a weight of a loss function based on the first feature and the second feature corresponding to each training data, and determine the similarity corresponding to each training data acquired for the first time as an initial weight. It is understood that there may be more than one initial weight. Each training data corresponds to an initial weight. The initial weights corresponding to the training data may be the same or different, and this is not specifically limited in this application.
The following iterative steps are performed a plurality of times:
step 40322, according to the initial weight, the first feature set and the second feature set, adjust the model parameters of the model to be trained.
After the execution subject obtains the initial weight, the model parameters of the model to be trained can be adjusted according to the initial weight, the first feature set and the second feature set. In particular, the initial weight may characterize cosine similarity or mahalanobis distance of the first and second features of the first and second feature sets corresponding to the same training data. The cosine similarity approaches 1, indicating that the first feature is more similar to the second feature. When the cosine similarity is 1, the first feature and the second feature are completely the same. The model parameters are manually adjusted or adaptively adjusted according to the weight by adjusting the initial weight to approach a preset value, such as 1, so that the first feature and the second feature corresponding to the same training data in the first feature set and the second feature set are similar in a zero ratio, and the effect of training the model to be trained can be achieved. The training objectives for the model to be trained are: the model to be trained can completely learn the input and the corresponding output of the trained model, namely the output results of the model to be trained and the trained model aiming at the same training data are consistent.
Specifically, step 40322 may be implemented by steps 403221 to 403223:
step 403221, determining an evaluation value according to the initial weight, the first feature set and the second feature set.
After obtaining the initial weight, the execution subject may determine an evaluation value according to the initial weight, the first feature set, and the second feature set. Specifically, the evaluation value may be a value of a loss function that supervises the model to be trained for training. The loss function may specifically be a product of a square of a difference between the first feature and the second feature corresponding to the same training data and a weight, or may also be a product of an absolute value of a difference between the first feature and the second feature corresponding to the same training data and a weight, and the specific expression form of the loss function is not specifically limited in the present application. Specifically, the evaluation value may be determined by a product of a square of a difference between the first feature and the second feature corresponding to the same training data in the first feature set and the second feature set and the initial weight. Or may be a product of an absolute value of a difference between the first feature and the second feature corresponding to the same training data in the first feature set and the second feature set and the initial weight, and the evaluation value is determined. For example, a first feature X corresponding to training data 11And a second feature Y1Initial weight W1Then the evaluation value may be W1(X1-Y1)2The value of (c). First feature X corresponding to training data 22And a second feature Y2Initial weight W2Then the evaluation value may be W2(X2-Y2)2The value of (c). It is understood that the determined evaluation value is consistent with the number of training data in the training data set according to the initial weight, the first feature set, and the second feature set.
Step 403222, in response to determining that the difference between the evaluation value and the preset value is greater than the preset threshold, adjusting the model parameters of the model to be trained.
After determining each evaluation value, the execution subject may determine a magnitude relationship between a difference between each evaluation value and the preset value and a preset threshold, and in response to determining that the difference between each evaluation value and the preset value is greater than the preset threshold, the execution subject may adjust a model parameter of the model to be trained. Specifically, the preset value may be a loss function value of a model that completely coincides with an input and an output of the trained model, that is, may be a value of a loss function that makes the first feature and the second feature corresponding to the same training data the same, and may be 0, for example. The preset threshold may be a fault tolerance value of the loss function corresponding to the first feature and the second feature of the same training data, that is, a minimum difference value that may exist between the evaluation value of the loss function of the qualified trained model to be trained and the preset value. When the execution subject determines that the difference between the evaluation value and the preset value is greater than the preset threshold, it indicates that the trained model to be trained is not a qualified model, and the difference from the trained model is large, and the model parameters of the model to be trained are adaptively adjusted to train the model to be trained under the condition that the loss function approaches 0 according to the current weight, the first feature set and the currently updated second feature set.
In this embodiment, the evaluation value of the loss function of the current model to be trained is determined according to the initial weight, the first feature set and the second feature set, and the difference between the evaluation value and the preset value is compared with the preset threshold, so that the difference between the current model to be trained and the trained model can be accurately determined, and when the difference is large, the model parameters of the model to be trained are continuously adaptively adjusted, so as to obtain a relatively accurate qualified training model.
And 403223, in response to determining that the difference between the evaluation value and the preset value is less than or equal to the preset threshold, determining the target model according to the model parameters and the weight obtained by the iteration step.
When the execution subject determines that the difference value between the evaluation value and the preset value is smaller than or equal to the preset threshold value, the fact that the trained model to be trained is a qualified model is shown, and the difference between the trained model and the trained model is within an expected range. The target model may be determined based on the model parameters and weights obtained by the iteration step at this time. The target model can be a qualified model which is obtained by iterative training of the model to be trained and has a difference with the trained model within a tolerable range.
Whether a model is qualified or not is evaluated by setting an evaluation value, a preset value and a preset threshold value, the training degree of the model can be flexibly judged, so that whether the model is qualified or not can be judged in time, unnecessary long-time training is avoided after the model is qualified, the training efficiency and the training effect of the model are improved, and efficient processing of images or data can be realized by using the trained model.
Step 40323, according to the updated model parameters, second features of each training data are extracted again to obtain an updated second feature set.
After the execution subject adjusts the model parameters of the model to be trained, the model parameters may be updated, and then the second features of the training data may be extracted again according to the updated model parameters to obtain an updated second feature set. Specifically, the executing agent may update the model parameters in the model to be trained, and extract the second features of each training data in the training data set again by using the model to be trained after updating the model parameters, so as to obtain an updated second feature set.
40324, update the initial weight according to the first feature set and the updated second feature set.
After obtaining the updated second feature set, the execution subject may update the initial weight according to the first feature set and the updated second feature set. Specifically, the executing agent may calculate similarity between the first feature and the updated second feature corresponding to each training data in the first feature set and the updated second feature set, update the similarity, and determine the updated similarity as the updated initial weight.
In the embodiment, through iterative training and adaptive adjustment of the model parameters and weights of the model to be trained, the training efficiency of the model to be trained can be improved, and the training effect of the model to be trained can be ensured, so that images or data can be efficiently processed by using the trained model.
In the application, the execution main body may perform face recognition in a picture or perform data classification by using the trained model, and specifically, the face picture to be recognized is input into the trained model, and then the trained model may recognize the face feature points of the input face picture and output the recognized face feature points. Specifically, the data to be classified may also be input into a trained model, and the trained model may classify the input data and output a classification identifier corresponding to each data. The image or data can be quickly and accurately processed by using the trained model.
With further reference to fig. 5, as an implementation of the method shown in the above figures, the present application provides an embodiment of an apparatus for training a model, which corresponds to the embodiment of the method shown in fig. 2, and which can be applied in various electronic devices.
As shown in fig. 5, the apparatus 500 for training a model of the present embodiment includes: an acquisition unit 501, a feature extraction unit 502, and a model training unit 503.
An obtaining unit 501 configured to obtain a trained model, a model to be trained, and a training data set.
A feature extraction unit 502 configured to extract a first feature of each training data in the training data set by using the trained model and a second feature of each training data in the training data set by using the model to be trained.
And a model training unit 503 configured to adjust model parameters and weights of the model to be trained according to the obtained first feature set and second feature set, so as to train the model to be trained.
In some optional implementations of this embodiment, the model training unit 503 is further configured to: for each training data in the training data set, calculating the similarity of a first feature and a second feature of the training data; and adjusting the model parameters and the weight of the model to be trained according to the similarity corresponding to each training data, the first characteristic set and the second characteristic set.
In some optional implementations of this embodiment, the model training unit 503 is further configured to: the following iterative steps are performed a plurality of times: determining an initial weight according to the similarity corresponding to each training data; adjusting model parameters of the model to be trained according to the initial weight, the first feature set and the second feature set; extracting the second features of the training data again according to the updated model parameters to obtain an updated second feature set; updating the initial weight according to the first feature set and the updated second feature set.
In some optional implementations of this embodiment, the model training unit 503 is further configured to: determining an evaluation value according to the initial weight, the first feature set and the second feature set; and adjusting the model parameters of the model to be trained in response to the fact that the difference value between the evaluation value and the preset value is larger than the preset threshold value.
In some optional implementations of this embodiment, the model training unit 503 is further configured to: and determining the target model according to the model parameters and the weight obtained by executing the iteration step in response to the fact that the difference value between the evaluation value and the preset value is smaller than or equal to the preset threshold value.
It should be understood that units 501 to 503, which are recited in the apparatus 500 for training a model, respectively, correspond to the respective steps in the method described with reference to fig. 2. Thus, the operations and features described above for the method for training a model are equally applicable to the apparatus 500 and the units included therein and will not be described in detail here.
An electronic device and a readable storage medium for training a model are also provided according to embodiments of the present application.
FIG. 6 is a block diagram of an electronic device for training a model according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 6, the electronic apparatus includes: one or more processors 601, memory 602, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses 605 and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses 605 may be used, along with multiple memories and multiple memories, if desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 6, one processor 601 is taken as an example.
The memory 602 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method for training a model provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method for training a model provided herein.
The memory 602, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and units, such as program instructions/units corresponding to the method for training a model in the embodiments of the present application (e.g., the obtaining unit 501, the feature extraction unit 502, and the model training unit 503 shown in fig. 5). The processor 601 executes various functional applications of the server and data processing by running non-transitory software programs, instructions and modules stored in the memory 602, namely, implements the method for training the model in the above method embodiment.
The memory 602 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created from use of the electronic device for training the model, and the like. Further, the memory 602 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 602 optionally includes memory located remotely from processor 601, and these remote memories may be connected over a network to an electronic device for training the model. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device for the method of training a model may further comprise: an input device 603 and an output device 604. The processor 601, the memory 602, the input device 603, and the output device 604 may be connected by a bus 605 or other means, and are exemplified by the bus 605 in fig. 6.
The input device 603 may receive input numeric or character information and generate key signal inputs related to user settings and function controls of the electronic apparatus used to train the model, such as a touch screen, keypad, mouse, track pad, touch pad, pointer stick, one or more mouse buttons, track ball, joystick, or other input device. The output devices 604 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
According to the technical scheme of the embodiment of the application, the weight and the model parameters can be adjusted in a self-adaptive manner through the first characteristic set and the second characteristic set of each training data in the training data set extracted by the trained model and the model to be trained respectively, so that the training of the model to be trained is realized, the training efficiency of the model to be trained is improved, and the training accuracy of the model to be trained is improved.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.
The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (12)

1. A method for training a model, comprising:
acquiring a trained model, a model to be trained and a training data set;
extracting first features of each training data in the training data set by using the trained model and extracting second features of each training data in the training data set by using the model to be trained;
and adjusting the model parameters and the weight of the model to be trained according to the obtained first characteristic set and the second characteristic set so as to train the model to be trained.
2. The method according to claim 1, wherein the adjusting the model parameters and weights of the model to be trained according to the obtained first feature set and second feature set comprises:
for each training data in the training data set, calculating the similarity of a first feature and a second feature of the training data;
and adjusting the model parameters and the weight of the model to be trained according to the similarity corresponding to each training data, the first feature set and the second feature set.
3. The method according to claim 2, wherein the adjusting the model parameters and the weights of the model to be trained according to the similarity corresponding to each training data, the first feature set and the second feature set comprises:
the following iterative steps are performed a plurality of times:
determining an initial weight according to the similarity corresponding to each training data;
adjusting model parameters of the model to be trained according to the initial weight, the first feature set and the second feature set;
extracting the second features of the training data again according to the updated model parameters to obtain an updated second feature set;
updating the initial weight according to the first feature set and the updated second feature set.
4. The method of claim 3, wherein the adjusting the model parameters of the model to be trained according to the initial weights, the first feature set, and the second feature set comprises:
determining an evaluation value according to the initial weight, the first feature set and the second feature set;
and adjusting the model parameters of the model to be trained in response to the fact that the difference value between the evaluation value and the preset value is larger than the preset threshold value.
5. The method of claim 4, wherein the adjusting the model parameters of the model to be trained according to the initial weights, the first feature set, and the second feature set comprises:
and determining a target model according to the model parameters and the weight obtained by the iteration step in response to the fact that the difference value between the evaluation value and the preset value is smaller than or equal to the preset threshold value.
6. An apparatus for training a model, comprising:
an obtaining unit configured to obtain a trained model, a model to be trained, and a training data set;
a feature extraction unit configured to extract a first feature of each training data in the training data set by using the trained model and extract a second feature of each training data in the training data set by using the model to be trained;
and the model training unit is configured to adjust the model parameters and the weights of the model to be trained according to the obtained first feature set and the second feature set so as to train the model to be trained.
7. The apparatus of claim 6, wherein the model training unit is further configured to:
for each training data in the training data set, calculating the similarity of a first feature and a second feature of the training data;
and adjusting the model parameters and the weight of the model to be trained according to the similarity corresponding to each training data, the first feature set and the second feature set.
8. The apparatus of claim 7, wherein the model training unit is further configured to:
the following iterative steps are performed a plurality of times:
determining an initial weight according to the similarity corresponding to each training data;
adjusting model parameters of the model to be trained according to the initial weight, the first feature set and the second feature set;
extracting the second features of the training data again according to the updated model parameters to obtain an updated second feature set;
updating the initial weight according to the first feature set and the updated second feature set.
9. The apparatus of claim 8, wherein the model training unit is further configured to:
determining an evaluation value according to the initial weight, the first feature set and the second feature set;
and adjusting the model parameters of the model to be trained in response to the fact that the difference value between the evaluation value and the preset value is larger than the preset threshold value.
10. The apparatus of claim 9, wherein the model training unit is further configured to:
and determining a target model according to the model parameters and the weight obtained by the iteration step in response to the fact that the difference value between the evaluation value and the preset value is smaller than or equal to the preset threshold value.
11. An electronic device for training a model, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.
12. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-5.
CN202011027431.9A 2020-09-25 2020-09-25 Method, apparatus, device and storage medium for training a model Pending CN112101551A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011027431.9A CN112101551A (en) 2020-09-25 2020-09-25 Method, apparatus, device and storage medium for training a model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011027431.9A CN112101551A (en) 2020-09-25 2020-09-25 Method, apparatus, device and storage medium for training a model

Publications (1)

Publication Number Publication Date
CN112101551A true CN112101551A (en) 2020-12-18

Family

ID=73755638

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011027431.9A Pending CN112101551A (en) 2020-09-25 2020-09-25 Method, apparatus, device and storage medium for training a model

Country Status (1)

Country Link
CN (1) CN112101551A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023098860A1 (en) * 2021-12-02 2023-06-08 华为技术有限公司 Communication method and communication apparatus

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109934300A (en) * 2019-03-21 2019-06-25 腾讯科技(深圳)有限公司 Model compression method, apparatus, computer equipment and storage medium
CN110009052A (en) * 2019-04-11 2019-07-12 腾讯科技(深圳)有限公司 A kind of method of image recognition, the method and device of image recognition model training
CN110516804A (en) * 2019-08-22 2019-11-29 广东浪潮大数据研究有限公司 Model training method and device
CN110598840A (en) * 2018-06-13 2019-12-20 富士通株式会社 Knowledge migration method, information processing apparatus, and storage medium
CN111079833A (en) * 2019-12-16 2020-04-28 腾讯科技(深圳)有限公司 Image recognition method, image recognition device and computer-readable storage medium
CN111275120A (en) * 2020-01-22 2020-06-12 支付宝(杭州)信息技术有限公司 Training method and device of image recognition model, and image recognition method and device
CN111554268A (en) * 2020-07-13 2020-08-18 腾讯科技(深圳)有限公司 Language identification method based on language model, text classification method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110598840A (en) * 2018-06-13 2019-12-20 富士通株式会社 Knowledge migration method, information processing apparatus, and storage medium
CN109934300A (en) * 2019-03-21 2019-06-25 腾讯科技(深圳)有限公司 Model compression method, apparatus, computer equipment and storage medium
CN110009052A (en) * 2019-04-11 2019-07-12 腾讯科技(深圳)有限公司 A kind of method of image recognition, the method and device of image recognition model training
CN110516804A (en) * 2019-08-22 2019-11-29 广东浪潮大数据研究有限公司 Model training method and device
CN111079833A (en) * 2019-12-16 2020-04-28 腾讯科技(深圳)有限公司 Image recognition method, image recognition device and computer-readable storage medium
CN111275120A (en) * 2020-01-22 2020-06-12 支付宝(杭州)信息技术有限公司 Training method and device of image recognition model, and image recognition method and device
CN111554268A (en) * 2020-07-13 2020-08-18 腾讯科技(深圳)有限公司 Language identification method based on language model, text classification method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023098860A1 (en) * 2021-12-02 2023-06-08 华为技术有限公司 Communication method and communication apparatus

Similar Documents

Publication Publication Date Title
CN112036509A (en) Method and apparatus for training image recognition models
CN114549935A (en) Information generation method and device
CN112101552A (en) Method, apparatus, device and storage medium for training a model
CN111507111B (en) Pre-training method and device of semantic representation model, electronic equipment and storage medium
CN111783605A (en) Face image recognition method, device, equipment and storage medium
CN111709875B (en) Image processing method, device, electronic equipment and storage medium
CN112084366A (en) Method, apparatus, device and storage medium for retrieving image
CN111028226A (en) Method and device for algorithm transplantation
CN111738419A (en) Quantification method and device of neural network model
CN111667056A (en) Method and apparatus for searching model structure
CN111611990A (en) Method and device for identifying table in image
CN112001366A (en) Model training method, face recognition device, face recognition equipment and medium
CN112508004A (en) Character recognition method and device, electronic equipment and storage medium
CN112149741A (en) Training method and device of image recognition model, electronic equipment and storage medium
CN111523467B (en) Face tracking method and device
CN112507833A (en) Face recognition and model training method, device, equipment and storage medium
CN111783949A (en) Deep neural network training method and device based on transfer learning
CN113627361B (en) Training method and device for face recognition model and computer program product
CN112561059B (en) Method and apparatus for model distillation
CN112016523B (en) Cross-modal face recognition method, device, equipment and storage medium
CN112101551A (en) Method, apparatus, device and storage medium for training a model
CN111582452B (en) Method and device for generating neural network model
CN116579407B (en) Compression method, training method, processing method and device of neural network model
CN110889392B (en) Method and device for processing face image
CN112529180A (en) Method and apparatus for model distillation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination