CN114692866A

CN114692866A - Method, apparatus and computer program product for aided model training

Info

Publication number: CN114692866A
Application number: CN202210290331.8A
Authority: CN
Inventors: 黄悦; 钱正宇; 胡鸣人; 袁正雄; 李金麒; 褚振方; 罗阳; 王国彬
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-03-23
Filing date: 2022-03-23
Publication date: 2022-07-01

Abstract

The present disclosure provides a method, an apparatus, an electronic device, a storage medium, and a computer program product for assisting model training, which relate to the field of computer technologies, and in particular to artificial intelligence and deep learning technologies, and may be used in model training and application scenarios. The specific implementation scheme is as follows: acquiring a training sample set, wherein the training sample comprises training data and label information; in the process of training an initial model through a training sample set, determining intermediate data in the forward propagation process of an output result obtained by the initial model according to input training data and loss information between the output result and corresponding label information; obtaining an interpretation result according to the intermediate data and the output result through a model interpreter; and adjusting the initial model according to the loss information and the interpretation result to obtain the trained target model. The model learning method and the model learning system not only enable the model learning stage to be interpretable, but also improve the model training speed and the model accuracy by combining the interpretation result training model.

Description

Method, apparatus and computer program product for aided model training

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to artificial intelligence and deep learning technologies, and in particular, to a method, an apparatus, an electronic device, a storage medium, and a computer program product for assisting model training.

Background

At present, machine learning techniques based on deep learning have been successful in many fields such as computer vision, natural language processing, and speech recognition, and machine learning models are also widely applied to some important realistic tasks such as object detection, image classification, and speech recognition. However, due to the lack of interpretability of the machine learning model, people cannot know the reason for making a decision by the machine learning model, and it is difficult to directly improve the machine learning model.

Disclosure of Invention

The disclosure provides a method and an apparatus for assisting model training, a model adjusting method and apparatus, an electronic device, a storage medium and a computer program product.

According to a first aspect, there is provided a method for assisting in model training, comprising: acquiring a training sample set, wherein training samples in the training sample set comprise training data and label information; in the process of training an initial model through a training sample set, determining intermediate data in the forward propagation process of an output result obtained by the initial model according to input training data and loss information between the output result and corresponding label information; obtaining an interpretation result according to the intermediate data and the output result, wherein the interpretation result is used for representing a logic basis of the initial model for obtaining the output result based on the intermediate data; and adjusting the initial model according to the loss information and the interpretation result to obtain the trained target model.

According to a second aspect, there is provided a model adaptation method comprising: acquiring data to be processed; processing the data to be processed through the target model, and determining intermediate data in the forward propagation process of the processing result obtained by the target model according to the data to be processed; obtaining an interpretation result according to the intermediate data and the processing result, wherein the interpretation result is used for representing a logic basis of the target model for obtaining the processing result based on the intermediate data; and adjusting the target model according to the interpretation result.

According to a third aspect, there is provided an apparatus for assisting model training, comprising: a first obtaining unit configured to obtain a training sample set, wherein training samples in the training sample set include training data and label information; a first determining unit configured to determine, in training an initial model by a training sample set, intermediate data in a forward propagation process in which the initial model obtains an output result from input training data, and loss information between the output result and corresponding label information; the first interpretation unit is configured to obtain an interpretation result according to the intermediate data and the output result, wherein the interpretation result is used for representing a logical basis of the initial model for obtaining the output result based on the intermediate data; and a first adjusting unit configured to adjust the initial model according to the loss information and the interpretation result to obtain a trained target model.

According to a fourth aspect, there is provided an adjusting apparatus in a model application process, comprising: a second acquisition unit configured to acquire data to be processed; the second determining unit is configured to process the data to be processed through the target model and determine intermediate data in a forward propagation process of the target model for obtaining a processing result according to the data to be processed; the second interpretation unit is configured to obtain an interpretation result according to the intermediate data and the processing result, wherein the interpretation result is used for representing the logical basis of the target model obtaining the processing result based on the intermediate data; and a second adjusting unit configured to adjust the target model according to the interpretation result.

According to a fifth aspect, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described in any one of the implementations of the first and second aspects.

According to a sixth aspect, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method as described in any one of the implementations of the first and second aspects.

According to a seventh aspect, there is provided a computer program product comprising: computer program which, when being executed by a processor, implements a method as described in any of the implementation manners of the first and second aspects.

According to the technology disclosed by the invention, the model training method combining the interpretation result of the model interpreter is provided, so that the model learning stage has interpretability, and the model training speed and the accuracy of the trained model are improved by combining the model training method combining the interpretation result.

It should be understood that the statements in this section are not intended to identify key or critical features of the embodiments of the present disclosure, nor are they intended to limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is an exemplary system architecture diagram in which one embodiment according to the present disclosure may be applied;

FIG. 2 is a flow diagram of one embodiment of a method for assisting model training according to the present disclosure;

FIG. 3 is a data flow diagram of a method for assisting model training according to the present disclosure;

FIG. 4 is a schematic diagram of an application scenario of the method for assisting model training according to the present embodiment;

FIG. 5 is a flow diagram for one embodiment of a model adjustment method according to the present disclosure;

FIG. 6 is a data flow diagram of a model adaptation method according to the present disclosure;

FIG. 7 is a block diagram of one embodiment of an apparatus for aiding model training in accordance with the present disclosure;

FIG. 8 is a block diagram of one embodiment of an adjustment device in a model application process according to the present disclosure;

FIG. 9 is a schematic block diagram of a computer system suitable for use in implementing embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.

Fig. 1 illustrates an exemplary architecture 100 to which the disclosed methods and apparatus for assisted model training, model tuning methods and apparatus, may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The communication connections between the

terminal devices

101, 102, 103 form a topological network, and the network 104 serves to provide a medium for communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The

terminal devices

101, 102, 103 may be hardware devices or software that support network connections for data interaction and data processing. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices supporting network connection, information acquisition, interaction, display, processing, and the like, including but not limited to smart phones, tablet computers, e-book readers, laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented, for example, as multiple software or software modules to provide distributed services, or as a single software or software module. And is not particularly limited herein.

The server 105 may be a server providing various services, such as a background processing server training an initial model in conjunction with the interpretation result of the model interpreter according to the operation instruction of the

terminal devices

101, 102, 103, and a background processing server adjusting a target model in conjunction with the interpretation result of the model interpreter according to the operation instruction of the

terminal devices

101, 102, 103. As an example, the server 105 may be a cloud server.

The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., software or software modules used to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be further noted that the method for assisting model training and the model adjusting method provided by the embodiments of the present disclosure may be executed by a server, may also be executed by a terminal device, and may also be executed by the server and the terminal device in cooperation with each other. Accordingly, the device for assisting model training and each part (for example, each unit) included in the adjusting device in the model application process may be all disposed in the server, may be all disposed in the terminal device, and may be disposed in the server and the terminal device, respectively.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. When the electronic device on which the method for assisting model training, the model adaptation method, runs does not require data transmission with other electronic devices, the system architecture may only comprise the electronic device (e.g. a server or a terminal device) on which the method for assisting model training, the model adaptation method, runs.

Referring to fig. 2, fig. 2 is a flowchart of a method for assisting model training according to an embodiment of the present disclosure, wherein the process 200 includes the following steps:

step 201, a training sample set is obtained.

In this embodiment, an executing subject (e.g., a terminal device or a server in fig. 1) of the method for assisting model training may obtain the training sample set from a remote location or from a local location based on a wired network connection manner or a wireless network connection manner. The training samples in the training sample set comprise training data and label information.

The training data in the training sample set may be a training sample set in each application domain. As an example, in the field of image classification, training data included in a training sample is a sample image, and label information is an image classification label; in the field of target object identification, training data included in training samples are sample images, and label information is a target object label.

In this embodiment, the execution subject may divide the training sample set into a training subset and a verification subset. The executing body can train the initial model through the training subset, and the accuracy of the trained initial model is verified through the verification subset.

Step 202, in the process of training the initial model through the training sample set, determining intermediate data in the forward propagation process of the initial model obtaining the output result according to the input training data, and loss information between the output result and the corresponding label information.

In this embodiment, the executing entity may determine, in the process of training the initial model by using the training sample set, intermediate data in a forward propagation process in which the initial model obtains an output result according to the input training data, and loss information between the output result and corresponding label information.

The model training stage comprises a forward propagation process and a backward propagation process, wherein in the forward propagation process, the initial model performs information processing such as feature extraction and classification according to input training data to obtain an output result; and in the back propagation process, gradient transmission is carried out according to the loss information between the output result and the corresponding label information, and the parameters of the initial model are updated.

During the forward propagation of the initial model, the executing body may obtain intermediate data representative of the information processing process based on the input training data. For example, in the classification model, the feature information extracted by the last feature extraction layer may be used as intermediate data.

And step 203, obtaining an interpretation result according to the intermediate data and the output result.

In this embodiment, the execution subject may obtain the interpretation result according to the intermediate data and the output result. And the interpretation result is used for representing the logic basis of the initial model and obtaining an output result based on the intermediate data.

The executive adds a model interpretation phase (model interpreter) after the model training phase to train the initial model. As an example, the model interpreter may be a neural network interpretation model with a model interpretation function, and the intermediate data and the output result are input into the neural network interpretation model to obtain an interpretation result.

Specifically, the model interpreter may determine a processing manner of the initial model on the input training data according to the intermediate data, thereby determining a basis for obtaining the output result, that is, an interpretation result. Taking the initial model as an image classification model as an example, according to the intermediate data, the model interpreter can determine the region of interest of the initial model in the sample image and the characteristic information referred by the output result, and further determine the data on which the output result is based to obtain the interpretation result.

In some optional implementations of this embodiment, the executing entity may determine intermediate data in a forward propagation process in which the initial model obtains an output result according to the input training data by: and determining intermediate data corresponding to each stage in the forward propagation process. And each stage is obtained by dividing a forward propagation process based on a preset division mode.

It will be appreciated that the forward propagation process typically includes multiple stages of information processing. The division mode of the stages can be specifically set according to actual needs. For example, the division may be performed according to the structure of the model, specifically, the stage division may be performed in the hierarchical structure of each feature extraction layer. For another example, the stages may be defined according to a model, for example, the residual error network is specifically divided into five stages from the first stage to the fifth stage.

In this implementation, the executing main body may execute the step 203 as follows: and obtaining an interpretation result corresponding to each stage according to the intermediate data and the output result corresponding to each stage in the forward propagation process.

Specifically, for each stage in the forward propagation process, the execution body obtains an interpretation result corresponding to the stage through the model interpreter according to the intermediate data and the output result corresponding to the stage. The explanation process of the model interpreter for each stage may refer to the obtaining process of the explanation result, which is not described herein.

In this implementation manner, the execution main body obtains intermediate data of each stage of the forward propagation process of the initial model, and further obtains an interpretation result corresponding to each stage, thereby improving the detail and the referential of the interpretation result.

In some optional implementation manners of this embodiment, the execution main body may obtain the interpretation result corresponding to each stage by using a deconvolution method according to the intermediate data and the output result corresponding to each stage in the forward propagation process.

It can be understood that in a general neural network model, feature extraction is performed through a convolutional network, so that subsequent feature information processing is performed to obtain an output result. In this implementation manner, the execution subject may specifically know the convolution operation in the initial model through a deconvolution method, and further, the execution subject may combine with a visualization technology to display the interpretation results corresponding to each stage, so as to facilitate the user to view and know the training process of the model. Visualization methods include, but are not limited to, thermodynamic diagrams, vector diagrams, and the like.

In the implementation mode, the accuracy of the interpretation result is improved through a deconvolution method.

And step 204, adjusting the initial model according to the loss information and the interpretation result to obtain the trained target model.

In this embodiment, the execution subject may adjust the initial model according to the loss information and the interpretation result, so as to obtain the trained target model.

Specifically, the executing entity may use a gradient determined based on the loss information as a main basis for adjusting parameters of the initial model, and on the basis, determine some adjustment parameters of the initial model according to the interpretation result to adjust the initial model. And responding to the preset ending condition by circularly executing the parameter adjusting process to obtain the trained target model.

As an example, according to the interpretation result, data on which the initial model obtains the output result is determined, and then parameter adjustment can be performed on the model structure on which the data is obtained on the premise that the data on which the initial model cannot be supported to obtain the correct output result.

The preset ending condition includes, but is not limited to, that the training time exceeds a preset time threshold, the training times exceeds a preset time threshold, and the loss information tends to converge.

Referring to FIG. 3, a data flow 300 of a method for assisting model training is shown. The training data includes a training sample set 301 and a verification sample set 302. The training sample set 301 and the validation sample set 302 perform training and validation, respectively, of the initial model 303. In the training and verification process of the initial model 303, the information processing process of the output result obtained by the initial model 303 is interpreted to obtain an interpretation result 304. Further, the interpretation results may feed back the training verification process of the initial model 303.

In some optional implementations of this embodiment, the executing main body may execute the step 204 by:

firstly, obtaining an analysis result corresponding to each stage of interpretation result according to the label information and the corresponding interpretation result of each stage in the forward propagation process.

The analysis result is used for indicating the rationality of the information processing mode of the stage corresponding to the interpretation result.

Specifically, for each stage in the forward propagation process, the execution main body obtains an analysis result corresponding to the interpretation result of the stage according to the tag information and the interpretation result corresponding to the stage.

Continuing to take image classification as an example, the sample image includes information of a plurality of objects such as an animal object and a furniture object, and the tag information of the sample image indicates that the sample image is a dog, which indicates that the initial model should pay attention to the region corresponding to the dog object in the sample image. However, if the interpretation result indicates that the region of interest of the initial model is different from the region corresponding to the subject, it can be determined that the information processing manner of the stage corresponding to the interpretation result is not reasonable.

Secondly, for each stage in the forward propagation process, adjusting parameters corresponding to the stage of the initial model according to the loss information and the analysis result corresponding to the interpretation result of the stage to obtain the trained target model.

As an example, when the analysis result indicates that the information processing manner at this stage is not reasonable, the executing entity may adjust the model parameters at a stage where the analysis result is not reasonable, based on the gradient information obtained from the loss information, and finally obtain the target model by executing the above process in a loop.

In the implementation mode, a specific mode for adjusting the initial model according to the interpretation result and the loss information is provided, and the accuracy of the finally obtained target model is improved.

In some optional implementations of this embodiment, the executing body may execute the first step by:

for each stage in the forward propagation process, the following operations are performed:

first, according to the interpretation result corresponding to the stage, the attention degree of the initial model to each part of data in the input training data at the stage is determined. The attention degree is used for representing the degree of dependence on each part of data in the training data in the process of obtaining an output result by the initial model. And then, obtaining an analysis result corresponding to the interpretation result of the stage according to the label information and the attention degree corresponding to the stage.

As an example, the intermediate data corresponding to each stage is characterized by a form of a weight matrix, and through the weight matrix, the execution subject can perform weight analysis to determine an explanation result characterizing the attention area and the attention degree of the initial model at the stage. And further, obtaining an analysis result corresponding to the explanation result of the stage according to the label information and the attention degree corresponding to the stage.

In this implementation, by determining the attention of the initial model to each part of the input training data at this stage, the analysis result corresponding to the interpretation result can be further determined, and the accuracy of the obtained analysis result is improved.

In some optional implementation manners of this embodiment, the executing entity may obtain an analysis result corresponding to the interpretation result of the stage according to the tag information and the attention degree corresponding to the stage by performing the following operations:

firstly, according to the label information, the target attention degree corresponding to the initial model at the stage is determined. And when the target attention is used for representing the target output result corresponding to the label information obtained by the initial model, the dependency degree of each part of data in the training data is obtained at the stage. And then, obtaining an analysis result corresponding to the interpretation result of the stage according to the target attention and the attention.

Specifically, the executing agent may determine, according to the label information, a degree of attention of the initial model to each region of the input training data at this stage. As an example, when the initial model is an image classification model, the attention information of the initial model characterized by the target attention should be concentrated on the region where each type of object is located in the training data.

In this implementation manner, the execution subject may present the target attention in different colors and different scale values. Furthermore, by comparing the target attention and the attention, the execution subject can determine whether there is a deviation of the attention region and/or a deviation of the attention corresponding to the attention region, and obtain an analysis result corresponding to the interpretation result at this stage.

In the implementation mode, a more accurate analysis result is obtained based on the target attention and the attention.

With continued reference to fig. 4, fig. 4 is a schematic diagram 400 of an application scenario of the method for assisting model training according to the present embodiment. In the application scenario of fig. 4, the server 401 first obtains a training sample set 403 from the database 402. Wherein the training samples in the training sample set 403 include training data and label information. Specifically, the training sample set 403 is divided into a training subset and a validation subset. Then, the initial model 404 is trained by the training sample set 403 in a machine learning manner. In the process of training the initial model 404 through the training sample set 403, determining intermediate data 406 in the forward propagation process of the initial model obtaining an output result 405 according to input training data, and loss information 408 between the output result 405 and corresponding label information 407; obtaining an interpretation result 410 according to the intermediate data 406 and the output result 405 through a model interpreter 409; based on the loss information 408 and the interpretation result 410, the initial model 404 is adjusted to obtain a trained target model.

In this embodiment, a method for training a model by combining an interpretation result of a model interpreter is provided, so that not only is the model learning phase interpretable, but also the model is trained by combining the interpretation result, and the model training speed and the accuracy of the trained model are improved.

With continuing reference to FIG. 5, an exemplary flow 500 of one embodiment of a model adaptation method according to the present application is shown, including the steps of:

step 501, obtaining data to be processed.

In this example, the execution subject of the model adjustment method (e.g., the terminal device or the server in fig. 1) may obtain the data to be processed from a remote location or from a local location based on a wired network connection manner or a wireless network connection manner.

The data to be processed may be data representing any content in any form. As an example, in the field of image classification, the data to be processed is an image to be classified; in the field of target object identification, data to be processed is an image to be identified; in the field of speech recognition, the data to be processed is speech to be recognized.

Step 502, processing the data to be processed through the target model, and determining intermediate data in the forward propagation process of the target model according to the data to be processed to obtain a processing result.

In this embodiment, the execution body may process the data to be processed through the target model, and determine intermediate data in a forward propagation process of a processing result obtained by the target model according to the data to be processed.

As an example, the target model may be a target model obtained by training by using a machine learning method, with training data in a training sample as input of the neural network model, and tag information corresponding to the input training data as expected output.

As yet another example, the target model may be a target model trained using the above embodiment 200. The initial model is trained by combining the interpretation result of the model interpreter, so that the model learning stage has interpretability, and the model is trained by combining the interpretation result, so that the accuracy of the target model is improved.

In the application process of the target model, the input data to be processed is generally processed through a forward propagation process such as a training phase, so as to obtain a result to be processed. Taking an image classification model as an example, the target model can extract the feature information of the data to be processed through the feature extraction network, and then classify the feature information through the classification layer to obtain a classification result.

During the forward propagation of the target model, the executing body may obtain intermediate data representative of the information processing process based on the input training data. For example, in the image classification model, the feature information extracted by the last feature extraction layer may be used as intermediate data.

And step 503, obtaining an interpretation result according to the intermediate data and the processing result.

In this embodiment, the execution subject may obtain the interpretation result according to the intermediate data and the processing result. And the interpretation result is used for representing the logical basis of the target model to obtain the processing result based on the intermediate data. The interpretation result may be obtained, for example, by a model interpreter.

As an example, the model interpreter can be a neural network interpretation model with a model interpretation function, and intermediate data and a processing result are input into the neural network interpretation model to obtain an interpretation result.

Specifically, the model interpreter may determine a processing manner of the target model on the input training data according to the intermediate data, thereby determining a basis for obtaining the output result, that is, an interpretation result. Taking the target model as an image classification model as an example, according to the intermediate data, the model interpreter can determine the region of interest of the target model in the sample image and the characteristic information referred by the processing result, and further determine the data on which the output result is based to obtain the interpretation result.

Referring to FIG. 6, a data flow 600 for the model adjustment method is shown. First, the target model 601 is loaded as a processing model 602 that processes data to be processed. The processing model 602 may process the data to be processed to obtain a corresponding output 603. Then, the output result 603 and the intermediate data 604 in the process of obtaining the output result are stored to obtain an interpretation result 605 by the output result 603 and the intermediate data 604. Further, the interpretation results 605 may feed back the adjustment process of the initial model 601.

In some optional implementations of this embodiment, the executing entity may determine the intermediate data in the forward propagation process by: and determining intermediate data corresponding to each stage in the forward propagation process. And each stage is obtained by dividing a forward propagation process based on a preset division mode.

In this implementation, the executing entity may execute the step 503 as follows: and obtaining an interpretation result corresponding to each stage according to the intermediate data and the processing result corresponding to each stage in the forward propagation process.

In this implementation manner, the execution main body obtains intermediate data of each stage of the forward propagation process of the target model, and further obtains an interpretation result corresponding to each stage, thereby improving the detail and the referential of the interpretation result.

In some optional implementation manners of this embodiment, the execution main body may obtain an interpretation result corresponding to each stage by using a deconvolution method according to intermediate data and a processing result corresponding to each stage in the forward propagation process.

It can be understood that in a general neural network model, feature extraction is performed through a convolutional network, so that subsequent feature information processing is performed to obtain an output result. In this implementation manner, the execution subject can specifically know the convolution operation in the target model through a deconvolution method, and further, the execution subject can display the interpretation results corresponding to each stage in combination with a visualization technology, so as to facilitate the user to check and connect the training process of the model. Visualization methods include, but are not limited to, thermodynamic diagrams, vector diagrams, and the like.

And step 504, adjusting the target model according to the interpretation result.

In this embodiment, the execution subject may adjust the target model according to the interpretation result.

Specifically, when there is a deviation or an error in the processing result of the data to be processed, the reason for the deviation or the error may be determined according to the interpretation result, and the target model may be adjusted according to the determined reason.

Taking image classification as an example, the classification corresponding to the data to be processed is determined to be a dog through the examination and verification of the processing result by the user, which indicates that the target model should pay attention to the region corresponding to the dog in the data to be processed. However, if the interpretation result indicates that the region of interest of the target model is different from the region corresponding to the subject, it may be determined that the information processing method of the target model corresponding to the interpretation result is not reasonable. Thus, the executing body may adjust the parameters of the target model to adjust the region of interest of the target model.

In the case of obtaining the interpretation result of each stage, the execution subject may specifically determine the stage of the parameter to be adjusted according to the interpretation result of each stage, so as to adjust the target model in a targeted manner.

In this embodiment, a method for adjusting a model by combining an interpretation result of a model interpreter is provided, so that not only the model application stage has interpretability, but also the model is adjusted by combining the interpretation result, thereby improving the accuracy of the model.

With continuing reference to FIG. 7, as an implementation of the method illustrated in the above figures, the present disclosure provides an embodiment of an apparatus for assisting model training, which corresponds to the embodiment of the method illustrated in FIG. 2, and which may be applied in various electronic devices.

As shown in fig. 7, the apparatus for assisting model training includes: a first obtaining unit 701 configured to obtain a training sample set, where training samples in the training sample set include training data and label information; a first determining unit 702 configured to, in training an initial model by a training sample set, determine intermediate data in a forward propagation process in which the initial model obtains an output result from input training data, and loss information between the output result and corresponding label information; a first interpretation unit 703 configured to obtain an interpretation result according to the intermediate data and the output result, wherein the interpretation result is used for representing a logical basis of the initial model obtaining the output result based on the intermediate data; a first adjusting unit 704 configured to adjust the initial model according to the loss information and the interpretation result to obtain a trained target model.

In some optional implementations of the present embodiment, the first determining unit 702 is further configured to: determining intermediate data corresponding to each stage in the forward propagation process, wherein each stage is obtained by dividing the forward propagation process based on a preset dividing mode; and a first interpretation unit 703, further configured to: and obtaining an interpretation result corresponding to each stage according to the intermediate data and the output result corresponding to each stage in the forward propagation process.

In some optional implementations of this embodiment, the first interpreting unit 703 is further configured to: and obtaining an interpretation result corresponding to each stage by a deconvolution method adopted by the model interpreter according to the intermediate data and the output result corresponding to each stage in the forward propagation process.

In some optional implementations of this embodiment, the first adjusting unit 704 is further configured to: obtaining an analysis result corresponding to the interpretation result of each stage according to the label information and the interpretation result corresponding to each stage in the forward propagation process, wherein the analysis result is used for indicating the rationality of the information processing mode of the stage corresponding to the interpretation result; and for each stage in the forward propagation process, adjusting parameters corresponding to the stage of the initial model according to the loss information and the analysis result corresponding to the interpretation result of the stage to obtain the trained target model.

In some optional implementations of this embodiment, the first adjusting unit 704 is further configured to: for each stage in the forward propagation process, the following operations are performed: determining attention degrees of the initial model to each part of data in the input training data in the stage according to the corresponding interpretation result of the stage, wherein the attention degrees are used for representing the degree of dependence of the initial model on each part of data in the training data in the process of obtaining the output result; and obtaining an analysis result corresponding to the interpretation result of the stage according to the label information and the attention degree corresponding to the stage.

In some optional implementations of this embodiment, the first adjusting unit 704 is further configured to: determining a target attention degree corresponding to the initial model at the stage according to the label information, wherein the target attention degree is used for representing the degree of dependence of the initial model on each part of data in the training data at the stage when a target output result corresponding to the label information is obtained; and obtaining an analysis result corresponding to the interpretation result of the stage according to the target attention and the attention.

In this embodiment, a device for training a model by combining an interpretation result of a model interpreter is provided, so that not only the model learning stage has interpretability, but also the model is trained by combining the interpretation result, thereby improving the model training speed and the accuracy of the trained model.

With continuing reference to fig. 8, as an implementation of the method shown in the above-mentioned figures, the present disclosure provides an embodiment of an adjusting apparatus in a model application process, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 5, and the apparatus may be specifically applied to various electronic devices.

As shown in fig. 8, the adjusting apparatus in the model application process includes: a second acquisition unit 801 configured to acquire data to be processed; a second determining unit 802, configured to process the data to be processed through the target model, and determine intermediate data in a forward propagation process in which the target model obtains a processing result according to the data to be processed; a second interpretation unit 803 configured to obtain an interpretation result according to the intermediate data and the processing result, wherein the interpretation result is used for representing a logical basis of the target model obtaining the processing result based on the intermediate data; a second adjusting unit 804 configured to adjust the target model according to the interpretation result.

In some optional implementations of the present embodiment, the second determining unit 802 is further configured to: determining intermediate data corresponding to each stage in the forward propagation process, wherein each stage is obtained by dividing the forward propagation process based on a preset division mode; and a second interpretation unit 803, further configured to: and obtaining an explanation result corresponding to each stage according to the intermediate data and the processing result corresponding to each stage in the forward propagation process.

In some optional implementations of this embodiment, the second interpreting unit 803 is further configured to: and obtaining an interpretation result corresponding to each stage by adopting a deconvolution method according to the intermediate data and the processing result corresponding to each stage in the forward propagation process.

In this embodiment, a device for adjusting a model by combining an interpretation result of a model interpreter is provided, so that not only the model application stage has interpretability, but also the model is adjusted by combining the interpretation result, thereby improving the accuracy of the model.

According to an embodiment of the present disclosure, the present disclosure also provides an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method for assisting model training, the method for model adaptation described in any of the above embodiments when executed by the at least one processor.

According to an embodiment of the present disclosure, the present disclosure further provides a readable storage medium storing computer instructions for enabling a computer to implement the method for assisting model training, the method for model adjustment, and the like described in any of the above embodiments when executed.

The embodiments of the present disclosure provide a computer program product, which when being executed by a processor can implement the method for assisting model training, the model adjusting method described in any of the embodiments above.

FIG. 9 illustrates a schematic block diagram of an example electronic device 900 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 9, the apparatus 900 includes a computing unit 901, which can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the device 900 can also be stored. The calculation unit 901, ROM 902, and RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.

A number of components in the device 900 are connected to the I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, and the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, optical disk, or the like; and a communication unit 909 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 909 allows the device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 901 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 901 performs various methods and processes described above, such as a method for assisting model training, a model adjustment method. For example, in some embodiments, the methods for assisting in model training, the model tuning methods, may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 900 via ROM 902 and/or communications unit 909. When loaded into RAM 903 and executed by computing unit 901, may perform one or more of the steps of the method for assisting in model training, the model adaptation method described above. Alternatively, in other embodiments, the computing unit 901 may be configured by any other suitable means (e.g., by means of firmware) to perform the method for assisting in model training, the model adaptation method.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of large management difficulty and weak service expansibility existing in the traditional physical host and Virtual Private Server (VPS) service; it may also be a server of a distributed system, or a server incorporating a blockchain.

According to the technical scheme of the embodiment of the disclosure, the method for training the model by combining the interpretation result of the model interpreter is provided, so that the model learning stage has interpretability, and the model training speed and the accuracy of the trained model are improved by combining the model training with the interpretation result.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in this disclosure may be performed in parallel, sequentially, or in a different order, as long as the desired results of the technical solutions provided by this disclosure can be achieved, and are not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method for assisting model training, comprising:

acquiring a training sample set, wherein training samples in the training sample set comprise training data and label information;

in the process of training an initial model through the training sample set, determining intermediate data in the forward propagation process of an output result obtained by the initial model according to input training data and loss information between the output result and corresponding label information;

obtaining an interpretation result according to the intermediate data and the output result, wherein the interpretation result is used for representing a logic basis of the initial model for obtaining the output result based on the intermediate data;

and adjusting the initial model according to the loss information and the interpretation result to obtain a trained target model.

2. The method of claim 1, wherein the determining intermediate data in the forward propagation process of the initial model to obtain the output result from the input training data comprises:

determining intermediate data corresponding to each stage in the forward propagation process, wherein each stage is obtained by dividing the forward propagation process based on a preset dividing mode; and

the obtaining of the interpretation result according to the intermediate data and the output result includes:

and obtaining an interpretation result corresponding to each stage according to the intermediate data corresponding to each stage in the forward propagation process and the output result.

3. The method according to claim 1 or 2, wherein obtaining the interpretation result corresponding to each stage according to the intermediate data corresponding to each stage in the forward propagation process and the output result comprises:

and obtaining an interpretation result corresponding to each stage by adopting a deconvolution method according to the intermediate data corresponding to each stage in the forward propagation process and the output result.

4. The method of claim 1 or 2, wherein said adapting the initial model to obtain a trained target model based on the loss information and the interpretation result comprises:

obtaining an analysis result corresponding to the interpretation result of each stage according to the label information and the interpretation result corresponding to each stage in the forward propagation process, wherein the analysis result is used for indicating the rationality of the information processing mode of the stage corresponding to the interpretation result;

and for each stage in the forward propagation process, adjusting parameters corresponding to the stage of the initial model according to the loss information and the analysis result corresponding to the interpretation result of the stage to obtain a trained target model.

5. The method according to claim 4, wherein obtaining the analysis result corresponding to the interpretation result of each stage in the forward propagation process according to the tag information and the interpretation result corresponding to each stage comprises:

for each stage in the forward propagation process, performing the following:

determining attention degrees of the initial model to each part of data in the input training data in the stage according to the corresponding interpretation result of the stage, wherein the attention degrees are used for representing the dependence degree of the initial model on each part of data in the training data in the process of obtaining the output result;

and obtaining an analysis result corresponding to the explanation result of the stage according to the label information and the attention degree corresponding to the stage.

6. The method according to claim 5, wherein obtaining the analysis result corresponding to the interpretation result of the stage according to the tag information and the attention degree corresponding to the stage comprises:

determining a target attention degree corresponding to the initial model at the stage according to the label information, wherein the target attention degree is used for representing the degree of dependence of the initial model on each part of data in training data at the stage when a target output result corresponding to the label information is obtained;

and obtaining an analysis result corresponding to the interpretation result of the stage according to the target attention and the attention.

7. A method of model adjustment, comprising:

acquiring data to be processed;

processing the data to be processed through a target model, and determining intermediate data in a forward propagation process of a processing result obtained by the target model according to the data to be processed;

obtaining an interpretation result according to the intermediate data and the processing result, wherein the interpretation result is used for representing a logic basis of the target model for obtaining the processing result based on the intermediate data;

and adjusting the target model according to the interpretation result.

8. The method of claim 7, wherein the determining intermediate data in the forward propagation process of the target model to obtain the processing result according to the data to be processed comprises:

the obtaining of the interpretation result according to the intermediate data and the processing result includes:

and obtaining an interpretation result corresponding to each stage according to the intermediate data corresponding to each stage in the forward propagation process and the processing result.

9. The method according to claim 7 or 8, wherein the obtaining an interpretation result corresponding to each stage according to the intermediate data corresponding to each stage in the forward propagation process and the processing result comprises:

and obtaining an interpretation result corresponding to each stage by adopting a deconvolution method according to the intermediate data corresponding to each stage in the forward propagation process and the processing result.

10. An apparatus for assisting model training, comprising:

a first obtaining unit configured to obtain a training sample set, wherein training samples in the training sample set include training data and label information;

a first determining unit configured to determine, in a process of training an initial model by the training sample set, intermediate data in a forward propagation process in which the initial model obtains an output result from input training data, and loss information between the output result and corresponding label information;

a first interpretation unit configured to obtain an interpretation result according to the intermediate data and the output result, wherein the interpretation result is used for characterizing a logical basis of the initial model for obtaining the output result based on the intermediate data;

a first adjusting unit configured to adjust the initial model according to the loss information and the interpretation result to obtain a trained target model.

11. The apparatus of claim 10, wherein the first determining unit is further configured to:

the first interpretation unit, further configured to:

12. The apparatus of claim 10 or 11, wherein the first interpretation unit is further configured to:

13. The apparatus of claim 10 or 11, wherein the first adjusting unit is further configured to:

obtaining an analysis result corresponding to the interpretation result of each stage according to the label information and the interpretation result corresponding to each stage in the forward propagation process, wherein the analysis result is used for indicating the rationality of the information processing mode of the stage corresponding to the interpretation result; and for each stage in the forward propagation process, adjusting parameters corresponding to the stage of the initial model according to the loss information and the analysis result corresponding to the interpretation result of the stage to obtain a trained target model.

14. The apparatus of claim 13, wherein the first adjusting unit is further configured to:

for each stage in the forward propagation process, performing the following operations:

determining attention degrees of the initial model to each part of data in the input training data in the stage according to the corresponding interpretation result of the stage, wherein the attention degrees are used for representing the dependence degree of the initial model on each part of data in the training data in the process of obtaining the output result; and obtaining an analysis result corresponding to the explanation result of the stage according to the label information and the attention degree corresponding to the stage.

15. The apparatus of claim 14, wherein the first adjusting unit is further configured to:

determining a target attention degree corresponding to the initial model at the stage according to the label information, wherein the target attention degree is used for representing the degree of dependence of the initial model on each part of data in training data at the stage when a target output result corresponding to the label information is obtained; and obtaining an analysis result corresponding to the interpretation result of the stage according to the target attention and the attention.

16. A model adjustment apparatus comprising:

a second acquisition unit configured to acquire data to be processed;

the second determining unit is configured to process the data to be processed through a target model and determine intermediate data in a forward propagation process of a processing result obtained by the target model according to the data to be processed;

a second interpretation unit configured to obtain an interpretation result according to the intermediate data and the processing result, wherein the interpretation result is used for characterizing a logical basis of the target model obtaining the processing result based on the intermediate data;

a second adjusting unit configured to adjust the target model according to the interpretation result.

17. The apparatus of claim 16, wherein the second determining unit is further configured to:

the second interpretation unit, further configured to:

18. The apparatus of claim 16 or 17, wherein the second interpretation unit is further configured to:

19. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.

20. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-9.

21. A computer program product, comprising: computer program which, when being executed by a processor, carries out the method according to any one of claims 1-9.