CN112101552A

CN112101552A - Method, apparatus, device and storage medium for training a model

Info

Publication number: CN112101552A
Application number: CN202011027448.4A
Authority: CN
Inventors: 杨馥魁
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-09-25
Filing date: 2020-09-25
Publication date: 2020-12-18

Abstract

The application discloses a method, a device, equipment and a storage medium for training a model, and relates to the field of artificial intelligence, in particular to the technical field of computer vision and deep learning. The implementation scheme is as follows: acquiring a trained model, a model to be trained and a training data set; extracting first features of each training data in the training data set by using the trained model and extracting second features of each training data in the training data set by using the model to be trained; determining a target feature pair set according to the obtained first feature set and the second feature set; and adjusting the model parameters of the model to be trained according to the target characteristic pair set so as to train the model to be trained. According to the implementation mode, the model to be trained is iteratively trained according to the feature pairs with larger differences, so that the model training efficiency and the training effect can be improved, the trained model can accurately process images or classify data, the training process is simplified, and the image processing or data processing efficiency is improved.

Description

Method, apparatus, device and storage medium for training a model

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to the field of computer vision and deep learning technologies, and more particularly, to a method, an apparatus, a device, and a storage medium for training a model.

Background

Distillation is a very common model compression technique. The model distillation technology adopts a trained large model to output trained standard features for supervising the training of a smaller model, thereby achieving the effect that the smaller model achieves the large model.

The existing distillation method monitors all input training data and cannot effectively solve the problem that the sample is difficult to have poor effect.

Disclosure of Invention

The present disclosure provides a method, apparatus, device, and storage medium for training a model.

According to an aspect of the present disclosure, there is provided a method for training a model, comprising: acquiring a trained model, a model to be trained and a training data set; extracting first features of each training data in the training data set by using the trained model and extracting second features of each training data in the training data set by using the model to be trained; determining a target feature pair set according to the obtained first feature set and the second feature set; and adjusting the model parameters of the model to be trained according to the target characteristic pair set so as to train the model to be trained.

According to another aspect of the present disclosure, there is provided an apparatus for training a model, comprising: an obtaining unit configured to obtain a trained model, a model to be trained, and a training data set; a feature extraction unit configured to extract a first feature of each training data in the training data set by using the trained model and extract a second feature of each training data in the training data set by using the model to be trained; a target feature pair set determination unit configured to determine a target feature pair set according to the obtained first feature set and the second feature set; and the model training unit is configured to adjust the model parameters of the model to be trained according to the target feature pair set so as to train the model to be trained.

According to yet another aspect of the present disclosure, there is provided an electronic device for training a model, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method for training a model as described above.

According to yet another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method for training a model as described above.

According to the technology of the application, the problem that the effect of a difficult sample is not good due to the fact that all input training data are supervised by the existing distillation method is solved, the target feature pair set is determined through the first feature pair set acquired by a trained model and the second feature pair set acquired by a model to be trained, feature pairs with large differences are obtained accordingly, the model to be trained is iteratively trained according to the determined feature pairs with large differences, the model training efficiency and the training effect can be improved, the trained model can accurately process images or classify data, the training process is simplified, and the image processing or data processing efficiency is improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a method for training a model according to the present application;

FIG. 3 is a schematic diagram of an application scenario of a method for training a model according to the present application;

FIG. 4 is a flow diagram of another embodiment of a method for training a model according to the present application;

FIG. 5 is a schematic diagram of an embodiment of an apparatus for training a model according to the present application;

FIG. 6 is a block diagram of an electronic device for implementing a method for training a model according to an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of the present method for training a model or apparatus for training a model may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various communication client applications, such as model training applications, may be installed on the

terminal devices

101, 102, 103.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, smart phones, tablet computers, e-book readers, car computers, laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules, or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may be a server providing various services, such as a background server that trains the model to be trained using the training data sets collected on the

terminal devices

101, 102, 103. The background server can obtain a training data set, respectively extract a feature set of each training data in the training data set by using the trained model and the model to be trained, and determine a target feature pair set according to the extracted feature set so as to train the model to be trained.

The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 105 is software, it may be implemented as a plurality of software or software modules, or as a single software or software module. And is not particularly limited herein.

It should be noted that the method for training the model provided in the embodiment of the present application may be executed by the

terminal device

101, 102, 103 or the server 105. Accordingly, the means for training the model is typically provided in the

terminal device

101, 102, 103 or the server 105.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method for training a model according to the present application is shown. The method for training the model of the embodiment comprises the following steps:

step 201, a trained model, a model to be trained, and a training data set are obtained.

In this embodiment, an executing entity (for example, the server 105 in fig. 1) of the method for training the model may obtain the training data set collected by the terminal device through a wired connection or a wireless connection, and call the trained model and the model to be trained local to the server or on the terminal device. The trained model can be a trained neural network model which can accurately identify or classify the image to be identified or the data to be classified so as to supervise the training of the model to be trained. The model to be trained may be an initial neural network model without any training or a partially trained intermediate state neural network model, which is not specifically limited in this application. The training data set may be a certain number of pictures to be recognized, for example, the training data set may be a certain number of face pictures for face recognition; or a certain amount of data to be classified, for example, the training data set may be a certain number of Chinese and English documents, and the Chinese and English in the documents are classified.

Step 202, extracting a first feature of each training data in the training data set by using the trained model and extracting a second feature of each training data in the training data set by using the model to be trained.

After obtaining the trained model, the model to be trained, and the training data set, the executing agent may extract a first feature of each training data in the training data set by using the trained model and extract a second feature of each training data in the training data set by using the model to be trained. For example, when the training data set is a set of face pictures to be recognized, each piece of training data in the training data set may be each face picture, the first feature may be a feature point of each face picture extracted by the trained model, and the second feature may be a feature point of each face picture extracted by the model to be trained. And obtaining a first feature set by each first feature extracted from the trained model, and obtaining a second feature set by each second feature extracted from the model to be trained. The first feature and the second feature may also be numerical values converted from the extracted feature points, or may also be feature vectors converted from the extracted feature points, and the specific expression form of the features is not limited in the present application.

Step 203, determining a target feature pair set according to the obtained first feature set and the second feature set.

After the execution subject obtains the first feature set and the second feature set, the execution subject may determine a target feature pair set according to the obtained first feature set and the second feature set. Specifically, the executing subject may calculate similarity between the first feature and the second feature corresponding to the same training data in the first feature set and the second feature set, and the smaller the similarity, the greater the difference between the first feature and the second feature corresponding to the same training data. Therefore, in the obtained similarities, a preset number of similarities with smaller similarities are selected, the first features and the second features corresponding to the selected preset number of similarities are respectively determined, and the determined first features and the corresponding second features form a target feature pair to obtain a target feature pair set. The difference between each target feature in each target feature pair in the target feature pair set is greater than a preset threshold, and the numerical value of the preset threshold is not specifically limited in the present application.

And 204, adjusting model parameters of the model to be trained according to the target feature pair set so as to train the model to be trained.

After the execution subject obtains the target feature pair set, the model parameters of the model to be trained can be adjusted according to the target feature pair set so as to train the model to be trained. Specifically, the executing subject may iteratively adjust a model parameter of the model to be trained by using a first feature extracted from a trained model in each target feature pair in the target feature pair set as a standard, update a second feature corresponding to the same training data, which is extracted again from the model to be trained, and pass back and update a difference between the second feature and the first feature corresponding to the same training data in a gradient manner, so as to train the model to be trained until it is determined that the difference is smaller than a preset threshold, thereby obtaining a trained target model.

With continued reference to FIG. 3, a schematic diagram of one application scenario of a method for training a model according to the present application is shown. In the application scenario of fig. 3, a server 301 obtains a trained model 303, a model to be trained 304, and a training data set 302. Server 301 extracts a first feature of each training data in training data set using trained model 303 and a second feature of each training data in training data set 302 using model to be trained 304. The server 301 determines a target feature pair set 307 from the obtained first feature set 305 and second feature set 306. The server 301 adjusts the model parameters 308 of the model 304 to be trained according to the target feature pair set 307 to train the model 304 to be trained.

In the embodiment, the target feature pair set is determined according to the first feature pair set acquired by the trained model and the second feature pair set acquired by the model to be trained, and accordingly, the feature pairs with large differences are obtained.

With continued reference to FIG. 4, a flow 400 of another embodiment of a method for training a model according to the present application is shown. As shown in fig. 4, the method for training a model of the present embodiment may include the following steps:

step 401, obtaining a trained model, a model to be trained, and a training data set.

Step 402, extracting a first feature of each training data in the training data set by using the trained model and extracting a second feature of each training data in the training data set by using the model to be trained.

And 403, determining a target feature pair set according to the obtained first feature set and the second feature set.

The principle of step 401 to step 403 is similar to that of step 201 to step 203, and is not described herein again.

Specifically, step 403 can be implemented by steps 4031 to 4034:

step 4031, a first feature pair of the first feature set is determined.

The execution subject may determine a first feature pair of the first feature set after obtaining the first feature set and the second feature set. Specifically, the executing entity may combine each feature in the first feature set two by two at will to obtain each first feature pair. As another implementation, the execution subject may also arrange each training data in the training data set in a form of a header in a matrix into a row and a corresponding column of each training data in the row. Then, the first features corresponding to the training data are correspondingly arranged in the matrix, so that a feature pair consisting of the same first features corresponding to the same training data is obtained, or a feature pair consisting of different first features corresponding to two different training data is obtained. For example, the training data set is: fig. 1 and fig. 2, of course, the training data set should include a large amount of training data, and only two pictures are included in the training data set as an example. Then each training data in the training data set, i.e., the corresponding first feature set of fig. 1 and 2, may be {1, 2 }. Fig. 1 corresponds to a first feature 1, and fig. 2 corresponds to a first feature 2. The first feature pair of the first feature set may be the feature pair (1, 1) corresponding to the same training data, i.e. fig. 1, the feature pair (1, 2) corresponding to two different training data, i.e. fig. 1, fig. 2, the feature pair (2, 1) corresponding to two different training data, i.e. fig. 2, fig. 1, and the feature pair (2, 2) corresponding to the same training data, i.e. fig. 2.

Step 4032, determine each training data pair corresponding to each first feature pair.

The execution subject, upon determining first feature pairs of the first feature set, may determine respective training data pairs corresponding to the respective first feature pairs. For example, the feature pair (1, 1) corresponds to the same training data, i.e., the training data pair of fig. 1 and 1. The feature pair (1, 2) corresponds to a training data pair composed of two different training data, namely the training data pair of fig. 1 and 2. The feature pair (2, 1) corresponds to a training data pair composed of two different training data, namely the training data pair of fig. 2 and fig. 1. The feature pair (2, 2) corresponds to a training data pair composed of the same training data, i.e., the training data pair of fig. 2 and 2.

Step 4033, determine a second feature pair of the second feature set according to each training data pair.

After determining the training data pairs corresponding to the first feature pairs, the execution subject may determine second feature pairs of the second feature set according to the training data pairs. For example, the training data set is: fig. 1 and 2, of course, the training data set should include a large amount of training data, and here, only two pictures are included in the training data set as an example, and the face in the picture is recognized. Each training data in the training data set, i.e. the corresponding second feature set of fig. 1 and 2, may be {1 ', 2' }. Wherein fig. 1 corresponds to the second feature 1 'and fig. 2 corresponds to the second feature 2'. And determining corresponding second feature pairs according to the obtained training data pairs. For example, the determined training data pairs are the training data pairs of fig. 1 and 1, the training data pairs of fig. 1 and 2, the training data pairs of fig. 2 and 1, and the training data pairs of fig. 2 and 2. The corresponding second feature pairs are (1 ', 1'), (1 ', 2'), (2 ', 1') and (2 ', 2'), respectively.

Step 4034, a target feature pair set is determined according to each first feature pair and each second feature pair.

After determining the first feature pairs and the second feature pairs, the execution subject may determine a set of target feature pairs according to each of the first feature pairs and each of the second feature pairs. Specifically, the executing agent may subtract corresponding features in the first feature pair and the second feature pair corresponding to the same training data pair, and determine the target feature pair according to a result obtained by the subtraction. Specifically, when the difference of the results is greater than a preset threshold, determining a feature pair composed of a first feature and a second feature corresponding to the same training data in a first feature pair and a second feature pair corresponding to the results as a target feature pair, thereby obtaining a target feature pair set. For example, the first feature pair (1, 2) is subtracted by the second feature pair (1 ', 2') corresponding to the same training data pair, and the result is greater than a preset threshold, for example, the preset threshold is 0, and the preset threshold is not specifically limited in the present application. The feature pair (1, 1 ') composed of the first feature 1 and the second feature 1' corresponding to the same training data fig. 1 and the feature pair (2, 2 ') composed of the first feature 2 and the second feature 2' corresponding to the same training data fig. 2 are both determined as the target feature pair, thereby determining the target feature pair set { (1, 1 '), (2, 2') }, of course, the target feature pair set may include not only these two feature pairs, which is only exemplified here.

In the embodiment, the feature pair set with a large difference between the features extracted by the trained model and the features extracted by the model to be trained corresponding to the same training data is determined, and the model to be trained is trained by the feature pair set, so that the training efficiency of the model to be trained can be improved, the training effect can be improved, the same processing effect when a small and simple model can process pictures or data by a large and complex model can be realized, the training process is simplified, and the efficiency of image processing or data processing is improved.

Specifically, step 4034 may be implemented by steps 40341 to 40342:

step 40341, calculate a first similarity of each first feature pair and a second similarity of each second feature pair.

The execution subject, after determining the first feature pairs and the second feature pairs, may calculate a first similarity of each first feature pair and a second similarity of each second feature pair. Specifically, the execution subject may calculate a similarity between the first features in each first feature pair, for example, each first feature may be converted into each first feature vector through a preset correspondence between the features and the vectors, and then a cosine similarity between each first feature vector is calculated as each first similarity. The execution subject may calculate a similarity between the second features in each second feature pair, for example, each second feature may be converted into each second feature vector through a preset correspondence between the features and the vectors, and then a cosine similarity between each second feature vector is calculated and is used as each second similarity.

Step 40342, according to each first similarity and each second similarity, determine a set of target feature pairs.

After the execution subject obtains the first similarities and the second similarities, the target feature pair set may be determined according to the first similarities and the second similarities. Specifically, the executing agent may subtract the first similarity and the second similarity corresponding to the same training data pair, take an absolute value of the subtraction result, and determine the target feature pair set according to the obtained absolute value. Specifically, the executing entity may sort the obtained absolute values, may be an ascending sort, or may be a descending sort, taking the ascending sort as an example, a first feature in a first feature set and a second feature in a second feature set corresponding to a training data pair corresponding to the absolute value ranked by the top N names and a second feature corresponding to the same training data form a target feature pair, and it may be understood that one absolute value may determine one or two sets of target feature pairs.

For example, the similarity of the first pair of features (1, 2) is calculated, assuming 0.1, and the similarity of the second pair of features (1 ', 2') corresponding to the same training data pair is calculated, assuming 0.3; the similarity of the first feature pair (1, 1) is calculated to be 1, and the similarity of the second feature pair (1 ', 1') corresponding to the same training data pair is calculated to be 1. Subtracting the first similarity and the second similarity corresponding to the same training data pair to obtain 0.1-0.3 which is-0.2 and the absolute value of which is 0.2; 1-1 ═ 0. The resulting absolute values are sorted incrementally by 0, 0.2. Assuming that a target feature pair is formed by a first feature and a second feature in a first feature set and a second feature set corresponding to a training data pair ranked at an absolute value corresponding to a first name, a training data pair with an absolute value of 0 is shown in fig. 1 and fig. 1; and if the first feature and the second feature in the first feature set and the second feature set corresponding to the training data pair are respectively 1 and 1 ', the finally obtained target feature pair is (1, 1'). And forming a target feature pair set by the obtained target feature pairs.

In the embodiment, the target feature pair set is determined through the first similarity of the first feature pair and the second similarity of the second feature pair, so that the feature pairs with larger differences in the features extracted from the same training data by the trained model and the model to be trained can be accurately determined, the effective distillation of the training data is realized, the training efficiency of the model to be trained can be improved, the model training effect is improved, the same processing effect when a small and simple model can process pictures or data by a large and complex model is realized, the training process is simplified, and the efficiency of image processing or data processing is improved.

Specifically, step 40342 may be implemented by steps 403421 to 403422:

step 403421, determining a first similarity matrix and a second similarity matrix according to the first similarities and the second similarities, respectively.

After the execution subject obtains the first similarity and the second similarity, the execution subject may determine a first similarity matrix and a second similarity matrix according to each of the first similarities and each of the second similarities, respectively. Specifically, the executing entity may arrange each first similarity into a first similarity matrix, and then determine a position of each first similarity in the first similarity matrix; then determining the positions of second similarity of the same training data pair corresponding to the first similarity in a second similarity matrix according to the determined positions; and obtaining a second similarity matrix according to the determined position and the corresponding second similarity. The first similarity and the second similarity of the corresponding position in the first similarity matrix and the second similarity matrix correspond to the same training data pair.

And 403422, determining a target feature pair set based on the first similarity matrix and the second similarity matrix.

After obtaining the first similarity matrix and the second similarity matrix, the execution subject may determine the target feature pair set based on the first similarity matrix and the second similarity matrix. Specifically, the executing entity may perform ratio calculation on each first similarity in the first similarity matrix and each second similarity in the second similarity matrix at a position corresponding to each first similarity, and compare the ratio calculation result with 1. When the absolute value of the difference between the ratio calculation result and 1 is greater than a preset threshold (the preset threshold may be set to 0.5, which is not specifically limited in this application), it indicates that the difference between the first similarity and the second similarity corresponding to the ratio calculation result is relatively large. And determining a feature pair consisting of a first feature in the first feature set and the second feature set corresponding to the training data pair corresponding to the first similarity and the second similarity and a second feature corresponding to the same training data as a target feature pair. And forming a target feature pair set by the determined target feature pairs.

In this embodiment, by obtaining the first similarity matrix and the second similarity matrix, the similarity corresponding to the same training data pair may be accurately calculated, so as to accurately obtain a target feature pair set composed of target feature pairs.

Specifically, step 403422 may be implemented by steps 4034221-4034223:

and 4034221, determining similarity difference values between each first feature pair and each corresponding second feature pair according to the first similarity matrix and the second similarity matrix.

After the execution main body obtains the first similarity matrix and the second similarity matrix, the similarity difference value between each first feature pair and each corresponding second feature pair can be determined according to the first similarity matrix and the second similarity matrix. Specifically, the executing entity may subtract the first similarity matrix from the second similarity matrix, that is, each first similarity in the first similarity matrix from each second similarity in the corresponding position in the second similarity matrix, and then take an absolute value of a subtraction result to obtain a similarity difference value between each first feature pair and each corresponding second feature pair. The similarity difference value is 0 or a positive number.

And step 4034222, determining a target similarity difference value according to the similarity difference value and a preset threshold value.

After the execution main body obtains the similarity difference values, the size of each similarity difference value and a preset threshold value can be judged. For each similarity difference value, in response to determining that the similarity difference value is greater than a preset threshold, determining that the similarity difference value is a target similarity difference value. The target similarity difference value may be a similarity difference value between a first similarity having a difference value greater than a preset threshold and a second similarity corresponding to the same training data. The purpose of obtaining the target similarity difference value is to obtain the corresponding similarity with larger difference, so that the first feature with larger difference and the second feature corresponding to the same training data are determined according to the determined similarity with larger difference to train the model to be trained. The "difference is large" here may mean that the difference exceeds a preset threshold.

And step 4034223, determining a target feature pair set according to the target similarity difference value.

After obtaining the target similarity difference value, the execution subject may determine, as a target feature pair, a feature pair composed of a first feature in the first feature set and the second feature set corresponding to the training data pair corresponding to each similarity corresponding to the similarity difference value and a second feature corresponding to the same training data. And forming a target feature pair set by the determined target feature pairs.

According to the method and the device, the difference value of the target similarity is obtained, so that the corresponding similarity with larger difference is obtained, the first feature with larger difference and the second feature corresponding to the same training data can be accurately determined according to the determined similarity with larger difference, the training efficiency and the training precision of the model to be trained are improved, the same processing effect when a small and simple model is used for processing pictures or data can be achieved by a large and complex model, the training process is simplified, and the efficiency of image processing or data processing is improved.

And step 404, adjusting model parameters of the model to be trained according to the target feature pair set so as to train the model to be trained.

The principle of step 404 is similar to that of step 204, and is not described here again.

Specifically, step 404 can be implemented by steps 4041 to 4045:

step 4041, according to the loss function of the target feature pair set and the model to be trained, a loss function value is obtained.

After the execution main body obtains the target feature pair set, a loss function value can be obtained according to the target feature pair set and the loss function of the model to be trained. Specifically, for example, the target feature pair set is { (1, 1 '), (2, 2') }, assuming that the loss function of the model to be trained is (X-Y)²Wherein X, Y may represent the first feature and the second feature, respectively. Then (X-Y)²＝(1-1’)²The loss function value can be taken. (2-2')²The loss function value may also be used. Of course, (1-1')²+(2-2’)²The loss function value may also be used.

Step 4042, according to the loss function value, adjusting and updating the model parameters of the model to be trained.

After the execution main body obtains the loss function value, the model parameters of the model to be trained can be adjusted and updated according to the loss function value. In particular, the execution body may be adjustableModel parameters of the model to be trained such that the value of the loss function is, for example, (1-1')²+(2-2’)²Approaching 0. In the adjustment process of the loss function value approaching 0, the model parameters are adjusted and updated. Specifically, when the loss function approaches 0, the second feature extracted from the model to be trained changes, and the second feature extracted changes because the model parameters of the model to be trained are adaptively adjusted and updated according to the change of the loss function.

Step 4043, again extracting the second feature of each training data according to the updated model parameter, to obtain an updated second feature set.

After the execution subject updates the model parameters, the second features of the training data can be extracted again according to the updated model parameters, so that an updated second feature set is obtained. Specifically, the subject is executed to update the model parameters so that the weight of the second feature identified by the model to be trained is changed, so that the second feature extracted again by the model to be trained is different from the second feature extracted in the history. After the model to be trained extracts the second features of the training data again, the second feature set is updated, and the executing subject continuously perfects the training of the model to be trained by using the continuously updated second feature set.

Step 4044, an updated set of target feature pairs is determined from the first feature set and the updated second feature set.

After the execution subject obtains the updated second feature set, an updated target feature pair set may be determined according to the first feature set and the updated second feature set. Specifically, the executing agent iteratively updates the target feature pair set based on the method for determining the target feature pair set according to the obtained updated second feature set, and continuously improves the training of the model to be trained according to the iteratively updated target feature pair set.

Step 4045, adjusting the model parameters of the model to be trained according to the updated target feature pair set.

And the execution main body can input the updated target feature pair set into a loss function, adjust model parameters of the model to be trained, iteratively train the model to be trained until the target feature pair set is not output, and finish the training of the model to be trained. Alternatively, when the number of iterations exceeds the threshold, the training of the model to be trained may be ended even though the set of target feature pairs may still be output.

In the embodiment, the model parameters of the model to be trained are adjusted according to the target feature pair set updated iteratively to train the model to be trained, so that the training efficiency and the training effect of the model to be trained can be improved, a small and simple model can achieve the same processing effect when a large and complex model processes pictures or data, the training process is simplified, and the efficiency of image processing or data processing is improved.

With further reference to fig. 5, as an implementation of the method shown in the above figures, the present application provides an embodiment of an apparatus for training a model, which corresponds to the embodiment of the method shown in fig. 2, and which can be applied in various electronic devices.

As shown in fig. 5, the apparatus 500 for training a model of the present embodiment includes: an acquisition unit 501, a feature extraction unit 502, a target feature pair set determination unit 503, and a model training unit 504.

An obtaining unit 501 configured to obtain a trained model, a model to be trained, and a training data set.

A feature extraction unit 502 configured to extract a first feature of each training data in the training data set by using the trained model and a second feature of each training data in the training data set by using the model to be trained.

A target feature pair set determining unit 503 configured to determine a target feature pair set according to the obtained first feature set and the second feature set.

A model training unit 504 configured to adjust model parameters of the model to be trained according to the target feature pair set to train the model to be trained.

In some optional implementations of this embodiment, the target feature pair set determining unit 503 is further configured to: determining a first feature pair of a first feature set; determining each training data pair corresponding to each first feature pair; determining a second feature pair of a second feature set according to each training data pair; and determining a target feature pair set according to the first feature pairs and the second feature pairs.

In some optional implementations of this embodiment, the target feature pair set determining unit 503 is further configured to: calculating a first similarity of each first feature pair and a second similarity of each second feature pair; and determining a target feature pair set according to the first similarity and the second similarity.

In some optional implementations of this embodiment, the target feature pair set determining unit 503 is further configured to: determining a first similarity matrix and a second similarity matrix according to the first similarities and the second similarities respectively; and determining a target feature pair set based on the first similarity matrix and the second similarity matrix.

In some optional implementations of this embodiment, the target feature pair set determining unit 503 is further configured to: determining similarity difference values of each first feature pair and each corresponding second feature pair according to the first similarity matrix and the second similarity matrix; determining a target similarity difference value according to the similarity difference value and a preset threshold value; and determining a target feature pair set according to the target similarity difference value.

In some optional implementations of this embodiment, the model training unit 504 is further configured to: the following iterative steps are performed a plurality of times: obtaining a loss function value according to the loss functions of the target feature pair set and the model to be trained; adjusting and updating model parameters of the model to be trained according to the loss function values; extracting the second features of the training data again according to the updated model parameters to obtain an updated second feature set; determining an updated target feature pair set according to the first feature set and the updated second feature set; and adjusting the model parameters of the model to be trained according to the updated target feature pair set.

It should be understood that units 501 to 504, which are described in the apparatus 500 for training a model, correspond to the respective steps in the method described with reference to fig. 2, respectively. Thus, the operations and features described above for the method for training a model are equally applicable to the apparatus 500 and the units included therein and will not be described in detail here.

An electronic device and a readable storage medium for training a model are also provided according to embodiments of the present application.

As shown in fig. 6, a block diagram of an electronic device for a method of training a model according to an embodiment of the present application is shown. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 6, the electronic apparatus includes: one or more processors 601, memory 602, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses 605 and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses 605 may be used, along with multiple memories and multiple memories, if desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 6, one processor 601 is taken as an example.

The memory 602 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method for training a model provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method for training a model provided herein.

The memory 602, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and units, such as program instructions/units corresponding to the method for training a model in the embodiment of the present application (for example, the obtaining unit 501, the feature extraction unit 502, the target feature pair set determination unit 503, and the model training unit 504 shown in fig. 5). The processor 601 executes various functional applications of the server and data processing by running non-transitory software programs, instructions and modules stored in the memory 602, namely, implements the method for training the model in the above method embodiment.

The memory 602 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created from use of the electronic device for training the model, and the like. Further, the memory 602 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 602 optionally includes memory located remotely from processor 601, and these remote memories may be connected over a network to an electronic device for training the model. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device for the method of training a model may further comprise: an input device 603 and an output device 604. The processor 601, the memory 602, the input device 603, and the output device 604 may be connected by a bus 605 or other means, and are exemplified by the bus 605 in fig. 6.

The input device 603 may receive input numeric or character information and generate key signal inputs related to user settings and function controls of the electronic apparatus used to train the model, such as a touch screen, keypad, mouse, track pad, touch pad, pointer stick, one or more mouse buttons, track ball, joystick, or other input device. The output devices 604 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

According to the technical scheme of the embodiment of the application, the target feature pair set is determined according to the first feature pair set obtained by the trained model and the second feature pair set obtained by the model to be trained, the feature pairs with larger differences are obtained according to the target feature pair set, the model to be trained is iteratively trained according to the determined feature pairs with larger differences, and the model training efficiency and the model training effect can be improved.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method for training a model, comprising:

acquiring a trained model, a model to be trained and a training data set;

extracting first features of each training data in the training data set by using the trained model and extracting second features of each training data in the training data set by using the model to be trained;

determining a target feature pair set according to the obtained first feature set and the second feature set;

and adjusting the model parameters of the model to be trained according to the target feature pair set so as to train the model to be trained.

2. The method of claim 1, wherein determining a set of target feature pairs from the first set of features and the second set of features comprises:

determining a first feature pair of the first feature set;

determining each training data pair corresponding to each first feature pair;

determining a second feature pair of the second feature set according to each training data pair;

and determining a target feature pair set according to each first feature pair and each second feature pair.

3. The method of claim 2, wherein said determining a set of target feature pairs from each of said first feature pairs and each of said second feature pairs comprises:

calculating a first similarity of each first feature pair and a second similarity of each second feature pair;

and determining a target feature pair set according to the first similarity and the second similarity.

4. The method of claim 3, wherein said determining a set of target feature pairs based on each of said first similarities and each of said second similarities comprises:

determining a first similarity matrix and a second similarity matrix according to the first similarities and the second similarities respectively;

and determining a target feature pair set based on the first similarity matrix and the second similarity matrix.

5. The method of claim 4, wherein the determining a set of target feature pairs based on the first similarity matrix and the second similarity matrix comprises:

determining similarity difference values between each first feature pair and each corresponding second feature pair according to the first similarity matrix and the second similarity matrix;

determining a target similarity difference value according to the similarity difference value and a preset threshold value;

and determining a target feature pair set according to the target similarity difference value.

6. The method of claim 1, wherein the adjusting model parameters of the model to be trained according to the set of target feature pairs comprises:

the following iterative steps are performed a plurality of times:

obtaining a loss function value according to the target feature pair set and the loss function of the model to be trained;

adjusting and updating the model parameters of the model to be trained according to the loss function values;

extracting the second features of the training data again according to the updated model parameters to obtain an updated second feature set;

determining an updated target feature pair set according to the first feature set and the updated second feature set;

and adjusting the model parameters of the model to be trained according to the updated target feature pair set.

7. An apparatus for training a model, comprising:

an obtaining unit configured to obtain a trained model, a model to be trained, and a training data set;

a feature extraction unit configured to extract a first feature of each training data in the training data set by using the trained model and extract a second feature of each training data in the training data set by using the model to be trained;

a target feature pair set determination unit configured to determine a target feature pair set according to the obtained first feature set and the second feature set;

and the model training unit is configured to adjust the model parameters of the model to be trained according to the target feature pair set so as to train the model to be trained.

8. The apparatus of claim 7, wherein the target feature pair set determination unit is further configured to:

determining a first feature pair of the first feature set;

determining each training data pair corresponding to each first feature pair;

9. The apparatus of claim 8, wherein the target feature pair set determination unit is further configured to:

10. The apparatus of claim 9, wherein the target feature pair set determination unit is further configured to:

11. The apparatus of claim 10, wherein the target feature pair set determination unit is further configured to:

12. The apparatus of claim 7, wherein the model training unit is further configured to:

the following iterative steps are performed a plurality of times:

13. An electronic device for training a model, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.

14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-6.