CN111582371B

CN111582371B - Training method, device, equipment and storage medium of image classification network

Info

Publication number: CN111582371B
Application number: CN202010384467.6A
Authority: CN
Inventors: 曹桂平
Original assignee: Guangzhou Shiyuan Electronics Thecnology Co Ltd
Current assignee: Guangzhou Shiyuan Electronics Thecnology Co Ltd
Priority date: 2020-05-07
Filing date: 2020-05-07
Publication date: 2024-02-02
Anticipated expiration: 2040-05-07
Also published as: CN111582371A

Abstract

The invention discloses a training method, a training device, training equipment and a training storage medium of an image classification network, wherein the training method comprises the following steps: generating an image classification network based on the labeled sample set; selecting target data from unlabeled data based on a strategy network and an evaluation network for labeling; updating the marked sample set according to the marking result of the target data, and updating the image classification network according to the updated marked sample set; updating the strategy network and the evaluation network according to the labeling result; and circularly executing the selecting operation and the updating operation until the preset condition is met, so as to obtain the trained image classification network. According to the technical scheme, only part of unlabeled data is selected for labeling and updating the image classification network each time, so that the data cost and labeling workload in the training process are reduced, and the classification accuracy is ensured.

Description

Training method, device, equipment and storage medium of image classification network

Technical Field

The embodiment of the invention relates to the technical field of image recognition, in particular to a training method, device and equipment of an image classification network and a storage medium.

Background

With the rapid development of medical imaging equipment and artificial intelligence technology, medical data is grown in large scale, and analysis of artificial intelligence medical images based on deep learning is gradually applied. According to a large amount of labeled data, an image classification network can be trained, and specific characteristics can be identified by using the image classification network, so that images are identified and classified, the workload of doctors is greatly reduced, and the diagnosis efficiency and accuracy are improved.

However, for image classification networks, a large amount of labeled data is required to train the model, which requires a professional doctor to do a large amount of manual labeling work, and in the medical field, acquisition of a large amount of medical image data is difficult, which makes the training process complicated and expensive; in addition, although a large amount of training data is necessary for training the deep neural network, not all data is necessary, if the training data contains noise or mislabeled data, the classification accuracy of the neural network is lowered, so that training is performed based on a large amount of or unnecessary data, the training efficiency and the classification accuracy of the image classification network are affected, the cost is high, and the implementation is complex.

Disclosure of Invention

The invention provides a training method, device, equipment and storage medium of an image classification network, which are used for reducing the data cost and labeling workload in the training process and ensuring the classification accuracy.

In a first aspect, an embodiment of the present invention provides a training method for an image classification network, including:

generating an image classification network based on the labeled sample set;

selecting target data from unlabeled data based on a strategy network and an evaluation network for labeling;

updating the marked sample set according to the marking result of the target data, and updating the image classification network according to the updated marked sample set;

updating the strategy network and the evaluation network according to the labeling result;

and circularly executing the selecting operation and the updating operation until the preset condition is met, so as to obtain the trained image classification network.

In a second aspect, an embodiment of the present invention provides a training apparatus for an image classification network, including:

the initial training module is used for generating an image classification network based on the marked sample set;

the data selection module is used for selecting target data from unlabeled data based on the strategy network and the evaluation network to label;

The first updating module is used for updating the marked sample set according to the marking result of the target data and updating the image classification network according to the updated marked sample set;

the second updating module is used for updating the strategy network and the evaluation network according to the labeling result;

and the circulation execution module is used for circularly executing the selection operation and the updating operation until the preset condition is met, so as to obtain the trained image classification network.

In a third aspect, an embodiment of the present invention provides a server, including:

one or more processors;

a storage means for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of training an image classification network as described in the first aspect.

In a fourth aspect, embodiments of the present invention also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the training method of the image classification network according to the first aspect.

The embodiment of the invention provides a training method, device, equipment and storage medium of an image classification network, wherein the method comprises the following steps: generating an image classification network based on the labeled sample set; selecting target data from unlabeled data based on a strategy network and an evaluation network for labeling; updating the marked sample set according to the marking result of the target data, and updating the image classification network according to the updated marked sample set; updating the strategy network and the evaluation network according to the labeling result; and circularly executing the selecting operation and the updating operation until the preset condition is met, so as to obtain the trained image classification network. According to the technical scheme, only part of unlabeled data is selected for labeling and updating the image classification network each time, so that the data cost and labeling workload in the training process are reduced, and the classification accuracy is ensured.

Drawings

Fig. 1 is a flowchart of a training method of an image classification network according to an embodiment of the present invention;

fig. 2 is a schematic diagram of an application scenario of a training method of an image classification network according to a first embodiment of the present invention;

fig. 3 is a flowchart of a training method of an image classification network according to a second embodiment of the present invention;

fig. 4 is a schematic diagram illustrating an implementation of a training method of an image classification network in a second embodiment of the present invention;

fig. 5 is a flowchart of a training method of an image classification network according to a third embodiment of the present invention;

fig. 6 is a schematic diagram of experimental results of a training method of an image classification network in the third embodiment of the present invention;

fig. 7 is a schematic structural diagram of a training device of an image classification network according to a fourth embodiment of the present invention;

fig. 8 is a schematic hardware structure of a device according to a fifth embodiment of the present invention.

Detailed Description

The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.

Before discussing exemplary embodiments in more detail, it should be mentioned that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart depicts operations (or steps) as a sequential process, many of the operations can be performed in parallel, concurrently, or at the same time. Furthermore, the order of the operations may be rearranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figures. The processes may correspond to methods, functions, procedures, subroutines, and the like. Furthermore, embodiments of the invention and features of the embodiments may be combined with each other without conflict.

The term "comprising" and variants thereof as used herein is intended to be open ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment".

Example 1

Fig. 1 is a flowchart of a training method of an image classification network according to an embodiment of the present invention. The method and the device can be applied to the condition that part of data is selected from a large amount of data to be marked to train an image classification network. In particular, the training method of the image classification network may be performed by a training device of the image classification network, which may be implemented in software and/or hardware and integrated in a device. Wherein the device includes, but is not limited to: desktop computers, notebook computers, cloud servers, and the like. The specific content of the labeled sample set or unlabeled data is not limited herein, and for example, in the process of identifying and classifying a lung image in the medical image field, labeled image data may be used as the labeled sample set, and unlabeled data refers to unlabeled lung image data to be labeled.

As shown in fig. 1, the method specifically includes the following steps:

s110, generating an image classification network based on the marked sample set.

In this embodiment, the labeled sample set refers to labeled image data, which may be labeled sample sets stored in a database, or may be labeled by a labeling person by selecting a part of data from a large amount of unlabeled data, and then using the labeled data as labeled sample sets. An image classification network can be generated for the marked sample set, the image classification network is essentially a trained neural network, after learning and training according to the data characteristics of the marked sample set and the corresponding labels, the image classification network can already correctly identify the data in the marked sample set, and the correct classification can be performed to obtain the corresponding labels by extracting the characteristics of the data in the marked sample set. After learning, the classification result of the image classification network on the marked sample set is consistent with the marking result provided by the marker.

According to the method, based on the image classification network generated based on the marked sample set, a small amount of target data is selected from the unlabeled data for marking and labeling each time, so that more marked samples can be used for updating and training of the image classification network step by step until preset conditions are met.

And S120, selecting target data from unlabeled data based on the strategy network and the evaluation network for labeling.

In this embodiment, the reinforcement learning mechanism based on the policy network and the evaluation network selects the most representative target data from the unlabeled data. In the reinforcement learning mechanism, the policy network and the evaluation network form an Agent, which can be used for performing random selection operation in data according to a certain rule, and selecting one or more optimal objects from multiple choices to achieve the purpose of selection, thereby selecting the most representative target data.

In this embodiment, the policy network refers to a neural network that performs a specific selection action (input), and after learning, gives a corresponding state quantity (output); the evaluation network is used for evaluating the state quantity by averaging the state quantity or expecting the state quantity, and the rewards corresponding to the state quantity can be determined, so that the strategy network can learn more reliable selection actions by maximizing expecting and the like, and more representative target data are selected so as to obtain higher rewards. In the one-time selection process, the strategy network and the evaluation network are calculated for a plurality of times in an iterative mode, the two networks are trained in a circulating and alternating mode, and the reliability of selected target data is improved.

Because the image classification network is generated according to the original marked sample set, when the image classification network is used for classifying other unmarked data, the accuracy of classification cannot be ensured. According to the method, a reinforced learning mechanism of the strategy network and the evaluation network is utilized, a part of target data can be selected from unlabeled data to be labeled, the selected target data is the most representative data with rich characteristics in unlabeled data and is most beneficial to the learning and classification of the image classification network, so that a reliable basis is provided for the training and learning of the image classification network, the image classification network learns more meaningful characteristics in the training process, and different data characteristics can be identified more accurately in practical application.

S130, updating the marked sample set according to the marking result of the target data, and updating the image classification network according to the updated marked sample set

Specifically, the target data is labeled and also provided with a label, and the labeled target data is added to the labeled sample set, so that basis can be provided for further training and learning of the image classification network, and the image classification network can continuously learn more meaningful characteristics in the training process, thereby improving the classification precision.

And S140, updating the strategy network and the evaluation network according to the labeling result.

Specifically, the state quantity and the corresponding rewards corresponding to the current selection action can be determined according to the labeling result, so that the strategy network and the evaluation network can determine a better selection action after training and learning, and the strategy network and the evaluation network are used for selecting a part of better target data from unlabeled data for labeling in the next cycle. On the basis, all unlabeled data are not required to be used for training an image classification network, only a small amount of high-quality target data are used for training, the workload of manual labeling is greatly reduced, in addition, the quality of the target data can be ensured, the characteristic representation of a labeled sample set in each cycle is ensured due to the fact that the selection action of a strategy network is continuously learned and optimized, the image classification network learns abundant characteristics by using a small amount of data, the acquisition difficulty and the data cost of labeled data are reduced, and the classification precision can be ensured.

S150, whether a preset condition is met, if yes, executing S160; if not, returning to S120, and circularly executing the above selection operation and the update operation until the preset condition is satisfied.

In the implementation, the condition that the preset condition is met means that the image classification network, the strategy network and the evaluation network are subjected to enough circulation to obtain effective training, the contribution of target data selected by the strategy network to the training of the image classification network is larger and larger, and the image classification network can identify more data features with high precision and has higher classification precision. For example, the preset condition may be that the above cycle of the selecting operation and the updating operation reaches a certain number of times, or that the data in the marked sample set reaches a certain number, or that the proportion of the target data that is selected in the unlabeled data exceeds a certain threshold value.

S160, obtaining the trained image classification network.

In this embodiment, when it is detected that the preset condition is satisfied, training of the image classification network is completed, and the image classification network obtained at this time has high classification accuracy, and can be applied to classification and identification of images, for example, lung image classification in the medical field.

In an embodiment, the preset condition includes: and the number of the samples in the marked sample set reaches a set threshold.

Specifically, when the labeled sample set is updated according to the labeling result of the target data in each cycle, the number of sample data in the labeled sample set is also updated, and when the number of sample data in the labeled sample set is detected to reach the set threshold, the selecting operation and the updating operation can be stopped, and the image classification network at that time is used as the finally obtained image classification network.

Fig. 2 is an application scenario diagram of a training method of an image classification network according to a first embodiment of the present invention. As shown in fig. 2, the policy network and the evaluation network may be understood as an intelligent labeling system, which is used to select the most representative target data from the unlabeled data for labeling by a label maker, and the labeled data may be used to update the labeled sample set, so that the image classification network generated based on the labeled sample set may also be updated. In this case, the selection actions of the policy network and the evaluation network, the quality of the selected target data directly affect the training efficiency and the classification accuracy. According to the training method in the embodiment, the selected target data is beneficial to training of the image classification network through the reinforcement learning mechanism of the strategy network and the evaluation network, so that the classification accuracy is ensured. In the process, only part of unlabeled data is selected for labeling each time, so that the workload of labeling is reduced, a large amount of sample data is not needed, and the complexity and cost of training are reduced.

According to the training method of the image classification network, provided by the embodiment of the invention, the target data is selected for training of the image classification network through the reinforcement learning mechanism based on the strategy network and the evaluation network, so that the updating of the image classification network, the strategy network and the evaluation network is realized, the target data which is most favorable for network training can be ensured to be selected, and therefore, only part of unlabeled data is selected for labeling and updating the image classification network each time, the data cost and labeling workload in the training process are reduced, and the classification precision is ensured.

Example two

Fig. 3 is a flowchart of a training method of an image classification network according to a second embodiment of the present invention, where the optimization is performed on the basis of the foregoing embodiment, and a process of selecting target data, updating the image classification network, and processing a labeling result are specifically described. It should be noted that technical details not described in detail in this embodiment may be found in any of the above embodiments.

In this embodiment, selecting target data from unlabeled data for labeling based on a policy network and an evaluation network specifically includes: sorting the unlabeled data according to the selected probability values from large to small based on the action vector a of the strategy network, and selecting the data with the highest probability value and the set quantity as the target data for labeling; wherein, the action vector a is determined according to the evaluation result of the evaluation network.

In this embodiment, updating the policy network and the evaluation network according to the labeling result specifically includes: generating buffer data according to the labeling result, and updating the strategy network and the evaluation network according to the buffer data; wherein buffering the data comprises: the image classification network before updating is corresponding to a first state quantity S of the predicted value of the test data; the updated image classification network corresponds to a second state quantity S' corresponding to the predicted value of the test data; the strategy network selects an action vector a of target data from unlabeled data; the reward function r of the action vector a of the policy network in the first state quantity S.

Specifically, as shown in fig. 3, the method specifically includes the following steps:

s210, generating an image classification network based on the marked sample set.

In this embodiment, the image classification network is trained according to an initial labeled sample set (denoted as L), and the image classification network is tested on a test data set to evaluate the current classification accuracy thereof and generate a corresponding first state quantity S.

S220, sorting unlabeled data according to the selected probability values from large to small based on the action vector a of the strategy network, and selecting the data with the highest probability value and the set quantity as target data for labeling; the motion vector a is determined according to the evaluation result of the evaluation network.

In this embodiment, the selection action of the policy network is defined as action vector a, a e (0, 1) ⁿ Where n represents the number of unlabeled exemplars, and the motion vector a may represent the probability that the corresponding unlabeled data (denoted as U) is selected as the target data. Selecting n with the highest probability value by sorting unlabeled data according to the probability value by using the motion vector a _s Labeling the samples. In the cyclic process, the action vector a of the policy network is determined according to the evaluation result of the evaluation network, i.e. the action vector a can be updated and optimized according to the evaluation of the evaluation network.

And S230, updating the marked sample set according to the marking result of the target data, and updating the image classification network according to the updated marked sample set.

In this embodiment, the policy network may select, according to the motion vector a, the most representative target data from the unlabeled data U according to the first state quantity S corresponding to the current image classification network, for labeling by the label maker, and then add the labeled target data to the labeled data set L.

Specifically, the image classification network is generated according to the original marked sample set L, and the network parameters of the image classification network are defined as theta _d After updating the marked data set L according to the marking result of the target data, the image classification network adjusts theta through further training _d To improve classification performance. The sample data in the marked sample set L is recorded asThe corresponding label is marked->Where n is the number of marked samples. The image classification network may be trained by minimizing cross entropy loss (denoted as L _ce ) Is defined as follows: />Where M is the total number of classes of marked samples, f (y _i =j) means classifying the ith sample, determining whether the label of the ith sample is of the class j, pr (y) _i ＝j|x _i ；θ _d ) Is of the j-th class at a given x _i Maximum output of the image classification network. According to the data and the labels of the marked sample set, L is obtained through iterative solution _ce Minimized theta _d An update of the image classification network may be implemented.

And S240, generating buffer data according to the labeling result, and updating the strategy network and the evaluation network according to the buffer data.

In this embodiment, buffering data includes: the image classification network before updating is corresponding to a first state quantity S of the predicted value of the test data; the updated image classification network corresponds to a second state quantity S' corresponding to the predicted value of the test data; the strategy network selects an action vector a of target data from unlabeled data; the reward function r of the action vector a of the policy network in the first state quantity S.

Specifically, the policy network may select, according to the first state quantity S corresponding to the current image classification network, the most representative target data from the unlabeled data according to the motion vector a for labeling by a labeling person, then add the labeled target data to the labeled data set L, and the image classification network may update itself based on the updated labeled data set L, and re-evaluate the classification accuracy on the test data set and generate the second state quantity S' and the reward value r.

And obtaining a group of buffer data < S, a, r and S' >, wherein the strategy network can select a sample with the largest information amount from unlabeled data according to the first state quantity S and the action vector a, train the sample to maximize rewards for selecting target data in future cycles, and evaluate the action vector a of the current strategy network by the evaluation network so as to promote the strategy network to optimize the action vector a and obtain better performance.

In this embodiment, the first state quantity S is determined according to the prediction result of the current image classification network on the test data, so as to improve the classification performance of the image classification network in the calculation process. Specifically, the first state quantity S ε (0, 1] ^n,M Designed as a matrix, where n represents the number of unlabeled exemplars, M represents the total number of classes, then the matrix represents all predictors for classifying unlabeled exemplars, and the definition of the (i, j) th element in the matrix is as follows: After updating the marked sample set and the image classification network according to the marking result of the target data, a new state matrix, namely a second state quantity S', can be generated for the measured data according to the same definition for the updated image classification network.

In an embodiment, generating the buffer data according to the labeling result includes: and calculating an error between a predicted value of the image classification network before updating for the target data and a labeling result of the target data based on the evaluation network to obtain a corresponding reward function r.

In this embodiment, the network object is classified according to the imageAnd constructing a reward function r by the error between the predicted value of the label data and the real label. Specifically, will k _i Defined as the actual tag of the target data,defined as the predicted value of the image classification network for the target data, then +.>In this case, the bonus function r may be defined as:the bonus function r represents a bonus corresponding to the motion vector a in the first state quantity S.

In one embodiment, the value of the reward function r is inversely related to the classification accuracy of the target data; the value of the reward function r is positively correlated with the reliability of the action vector a of the policy network.

In this embodiment, the purpose of designing the reward function r is to make the selection action a of the policy network pay more attention to samples that may be misclassified by the image classification network, and using these samples is more beneficial to training the image classification network and improving the classification performance of the image classification network. As shown in the above-described reward function r (S, a), if the target data is correctly classified by the image classification network And->Thus, the smaller the value of the reward function r, the higher the classification accuracy of the image classification network on the target data. On this basis, a higher value of the reward function r indicates that the currently selected target data is not correctly classified, so the image classification network should pay more attention to these data with mispredictions, and the reward obtained by selecting such data will be higher, so that the policy network can be encouraged to select these mispredicted data based on the design of the reward function r.

S250, whether a preset condition is met or not, if yes, executing S160; if not, returning to S120, and circularly executing the above selection operation and the update operation until the preset condition is satisfied.

S260, obtaining the trained image classification network.

Fig. 4 is a schematic diagram illustrating an implementation of a training method of an image classification network according to a second embodiment of the present invention. As shown in fig. 4, the training process for the image classification network is mainly implemented by three kinds of neural networks, namely, an image classification network (Classifier), a policy network (Actor network is taken as an example), and an evaluation network (Critic network is taken as an example). Firstly, an initial class identifier can be obtained through training by using a marked data set L, the class identifier is tested on a test data set to evaluate the current classification precision of the class identifier, and a corresponding first state quantity S is generated; the Actor network selects representative target data from unlabeled data U according to the first state quantity S for labeling by a labeling person, and adds the labeled target data into a labeled data set L; the Classifier updates itself by using the updated marked data set L, and reevaluates the classification accuracy on the test data set and generates a second state quantity S' and a reward function r; a group of buffer data < S, a, r, S' are obtained in each cycle and stored in a buffer pool.

The Actor network and the Critic network extract the buffer data from the buffer pool, train alternately, and improve the reliability of selecting target data next time by evaluating and optimizing the action vector a of the Actor. Classifier, actor, critic the three networks are continuously circulated according to the process shown in fig. 4, so that sample data in marked sample sets are gradually increased, and the classifiers are also gradually updated according to marking results of different target data until preset conditions are met, so that the final classifiers are obtained, and the classification accuracy is high enough. In the process, the training of the classifizer can be realized by only utilizing part of target data without using all unlabeled data, the data cost is reduced, and the reliability of the selection action enables the classifizer obtained by training to have high precision.

According to the training method of the image classification network, which is provided by the embodiment II, optimization is performed on the basis of the embodiment, and the target data with higher quality is continuously selected for training and updating of the image classification network through the reinforcement learning mechanism of the strategy network and the evaluation network, so that the quantity of sample data and the labeling workload required by training are greatly reduced; and because the strategy network and the evaluation network are continuously optimized according to the labeling result and the buffer data, the selected target data is the most representative and has the most abundant characteristics, and the image classification network can quickly learn the most representative characteristics, thereby ensuring the classification precision of the image classification network.

Example III

Fig. 5 is a flowchart of a training method of an image classification network according to a third embodiment of the present invention, in which optimization is performed on the basis of the above embodiment, and update processes of a policy network and an evaluation network are specifically described. It should be noted that technical details not described in detail in this embodiment may be found in any of the above embodiments.

In this embodiment, updating the policy network and the evaluation network according to the buffered data includes: calculating a return function corresponding to the strategy network and the evaluation network according to the buffer data, wherein the return function is a Q value function; calculating a loss function corresponding to the strategy network and the evaluation network according to the buffer data; respectively adjusting network parameters theta of the strategy network by taking the maximized return function and the minimized loss function as optimization targets _a And evaluating network parameters θ of the network _c 。

In this embodiment, the network parameters θ of the policy network are respectively adjusted with the maximized return function and the minimized loss function as optimization targets _a And evaluating network parameters θ of the network _c Comprising: network parameters θ according to policy network _a And updating network parameter θ of the target policy network with the first update factor _a 'A'; based on the network parameters θ of the evaluation network _c And updating the network parameter θ of the target evaluation network with the second update factor _c 'A'; recalculating a return function and a loss function corresponding to the target strategy network and the target evaluation network; repeatedly performing update operations and calculations of network parametersOperating, until the return function is maximum and the loss function is minimum, the network parameter theta of the target strategy network is calculated _a Network parameters θ of' and target evaluation networks _c ' as network parameters of the adjusted policy network and as network parameters of the evaluation network, respectively.

As shown in fig. 5, the method specifically includes:

s310, generating an image classification network based on the marked sample set.

And S320, selecting target data from unlabeled data based on the strategy network and the evaluation network for labeling.

S330, updating the marked sample set according to the marking result of the target data, and updating the image classification network according to the updated marked sample set.

And S340, generating buffer data according to the labeling result.

And S350, evaluating the action vector a of the strategy network based on the evaluation network, and updating the action vector a according to the evaluation result.

In this embodiment, the updating of the policy network and the evaluation network includes two aspects, namely, the optimization updating of the selection action of the policy network and the updating of the network parameters of the policy network and the evaluation network. For optimization updating of the selected action, in each cycle, buffer data are extracted from a data buffer pool, a strategy network is used for outputting an action vector a according to a first state quantity S, an evaluation network can evaluate the action vector a, two networks are trained alternately, so that a reliable action vector a is obtained, and the strategy network selects some data with the largest information quantity from unlabeled data as target data so as to obtain a reward function of the action vector a to be maximized.

In this embodiment, the updating of the motion vector a in S350 is performed in synchronization with the adjustment of the network parameters in S260 and S370.

S360, calculating a return function corresponding to the strategy network and the evaluation network according to the buffer data, wherein the return function is a Q value function; and calculating a loss function corresponding to the strategy network and the evaluation network according to the buffer data.

For updating network parameters of the policy network and the evaluation network, the objective of reinforcement learning is to maximize the future expected return given the first state quantity S, which is defined as a Q-value function for evaluating the return provided by the motion vector a provided at the first state quantity S.

In this embodiment, the Q-value function can be expressed according to Bellman (Bellman) equation as follows: q (S, a; θ) _c )＝E[r(S,a)+γQ(S',π(S')；θ _c )]Wherein θ _c In order to evaluate the network parameters of the network, Q (S ', pi (S')) represents the return that can be obtained by the action vector pi (S ') at the second state quantity S', gamma being an attenuation factor of the return, by evaluating the network parameters θ of the network _c Maximising the mean of the sum of the underlying reward function r (S, a) and the return, i.e. solvingGreedy policies of the policy network may be learned to update network parameters θ of the policy network _a 。

In this embodiment, for the optimization of the evaluation network, the loss function is calculatedWherein (1)>By minimizing the loss function, i.e. solvingThe evaluation network may be updated.

And S370, respectively adjusting the network parameters of the strategy network and the network parameters of the evaluation network by taking the maximized return function and the minimized loss function as optimization targets.

Targeting maximization of the return function, i.eOptimization with minimized loss functionThe mark, i.e.)>Network parameters θ that can be separately solved for policy networks _a Evaluating network parameters θ of a network _c 。

In an embodiment, the network parameters θ of the policy network are adjusted with the objective of maximizing the return function and minimizing the loss function _a And a network parameter θ of the evaluation network _c Comprising: network parameters θ according to policy network _a And updating network parameter θ of the target policy network with the first update factor _a 'A'; based on the network parameters θ of the evaluation network _c And updating the network parameter θ of the target evaluation network with the second update factor _c 'A'; recalculating a return function and a loss function corresponding to the target strategy network and the target evaluation network; repeatedly executing the updating operation and the calculating operation of the network parameters until the network parameters theta of the target strategy network are met when the return function is maximum and the loss function is minimum _a Network parameters θ of' and target evaluation networks _c ' as network parameters of the adjusted policy network and as network parameters of the evaluation network, respectively.

In this embodiment, the policy network is an Actor network, and the evaluation network is a Critic network. In order to make training of the Actor and Critic networks more stable, a target Actor network and a target Critic network are introduced. Using a parameter theta _a' Target Actor network and parameters of theta _c' Is calculated by the target Critic network of (2)The minimization of the loss function can be expressed as: />Wherein pi' (. Theta.) _a' ) The method is a selecting action of target Actor network estimation, Q' (. Cndot. ); θ _c' ) Is a Q function corresponding to the target Critic network. The optimization problem can be solved by a gradient strategy algorithm.

It should be noted thatIn the process of circularly and iteratively solving the maximized return function and the minimized loss function, the network parameter theta of the target Actor network _a' Can be according to the first update factor lambda ₁ Determination, i.e. θ _a' :＝λ ₁ θ _a +(1-λ ₁ )θ _a' The method comprises the steps of carrying out a first treatment on the surface of the Similarly, network parameter θ of target Critic network _c' Can be according to the second update factor lambda ₂ Determination, i.e. θ _c' :＝λ ₂ θ _c +(1-λ ₂ )θ _c' Wherein lambda is ₁ And lambda (lambda) ₂ May be equal or different.

And continuously updating the network parameters of the target Actor network and the target Critic network according to the first updating factor and the second updating factor respectively until the network parameters of the target Actor network and the target Critic network meet the maximum return function and the minimum loss function, and stopping updating, wherein the network parameters at the moment are the network parameters of the Actor network and the network parameters of the Critic network which are finally determined.

S380, whether a preset condition is met, if yes, executing S160; if not, returning to S120, and circularly executing the above selection operation and the update operation until the preset condition is satisfied.

S390, obtaining the trained image classification network.

The following is the implementation flow of the training method of the image classification network in this embodiment:

input (Input): labeled sample set (X) _l ,Y _l ) And unlabeled data X _u ；

Initialization (initialization): based on the labeled sample set (X _l ,Y _l ) Training an image classification network Classifier to obtain f (; θ _d )；

Each cycle process includes (For each epoch do):

1) Calculating a first state quantity S:

2) The policy network determines an action vector a=pi (S; θ _a ) Selecting n with maximum probability from unlabeled data _s Individual samplesLabeling is carried out by a labeling person to obtain a corresponding label +.>Namely, labeling the result;

3) Adding the labeling result to the labeled sample set, i.eAnd based on the updated (X) _l ,Y _l ) Updating the parameter θ of Classifier _d ；

4) Calculating a second state quantity S':

5) Calculating a reward function:

6) Data < S, a, r, S' > are stored in the buffer pool.

From the data in the buffer pool, each round robin procedure includes (For each epoch do):

7) Updating the evaluation network:

8) Updating the policy network:

9) Updating a target Actor network: θ _c' :＝λ ₁ θ _c +(1-λ ₁ )θ _c' ；

10 Updating target Critic network): θ _c' :＝λ ₂ θ _c +(1-λ ₂ )θ _c' ；

Looping through 7) -10) above until the maximum return function and the minimum loss function are satisfied, then returning to 1).

And (3) circularly executing the steps 1) -10) until the preset condition is met.

The effectiveness of the training method of the image classification network in this embodiment is proved by experiments below, and for different types of medical image data, including 3D CT data and 2D fundus image data, various indexes are used to evaluate the performance of the image classification network.

In this experiment, the datasets employed included chestCT and kagle. Among them, chestCT is a 3D CT dataset for detection of four diseases of the lungs including lung nodules, lung chordae, arteriosclerosis and lymph node calcification. 3500 samples are randomly extracted from the data set as a training data set, and 3500 samples are test data sets. Kagle is a 2D fundus map dataset containing 35126 fundus maps acquired from different devices. According to the degree of diabetic retinopathy, the classification was 5 (no-sugar net disease, mild sugar net disease, moderate sugar net disease, severe sugar net disease and proliferative sugar net), from which 2230 pieces were randomly extracted as a training data set, and 2230 pieces were used as test data.

In this experiment, the policy network was an Actor network, and the evaluation network was a Critic network. The training method of the image classification network in this embodiment is compared with the following methods:

(1) Random: in the training process, samples are randomly selected from candidate samples each time to be marked by a marker.

(2) LC (liquid crystal): assuming the classifier parameter is θ, for candidate samples, their corresponding lc _i Defined as the highest predicted probability among all possible values:if lc corresponding to the sample _i Lower, then the confidence of the classifier for the sample prediction is lower. LC method is to follow LC for all candidate samples _i Ascending sort is performed and lc is performed _i And labeling the sample with the lowest value.

(3) MS: assuming the classifier parameter is θ, for candidate sample x _i We willIts corresponding ms _i Definition: ms of _i ＝Pr(y _i ＝j ₁ |x _i ；θ)-Pr(y _i ＝j ₂ |x _i The method comprises the steps of carrying out a first treatment on the surface of the θ), where j ₁ And j ₂ Representing the category with the highest and second highest probability, ms, respectively _i The lower the specification the more uncertain the classifier's predictions for that sample. MS method follows MS for all candidate samples _i Ascending order is performed and ms is measured _i The smallest sample is labeled.

(4) EN: assuming the classifier parameter is θ, for candidate sample x _i We will correspond to en _i Is defined asen _i For measuring degree of uncertainty of predicted result, if en _i The higher the prediction of the sample by the classifier is, the more uncertain the EN method follows EN for all candidate samples _i Descending order is performed, and en is selected _i The largest sample is labeled.

(5) Fusion: the method is to fuse the above three methods (LC, MS and EN), namely, firstly, respectively selecting K/2 samples according to the above three methods, then deleting repeated samples from the 3K/2 samples, and finally randomly selecting K samples from the deleted samples for labeling.

Fig. 6 is a schematic diagram of experimental results of a training method of an image classification network in the third embodiment of the present invention. As shown in fig. 6, taking the experimental results on the chestCT dataset as an example, our represents the results obtained by the image classification network in this embodiment (i.e. the darkest or thickest curve in fig. 6, which always keeps the ordinate highest from the vicinity of the abscissa 675), and all represents the results obtained after training with all the data. As can be seen from fig. 6, the classification accuracy of each method is lower when the data amount is smaller, and the classification accuracy of the image classification network in the present embodiment is higher as the data amount is increased, even exceeding the effect of training with all data. The experimental result on the Kaggle data set also shows that the classification accuracy of the image classification network in the embodiment is higher, and the generalization is better.

According to the training method for the image classification network, which is provided by the embodiment of the invention, optimization is performed on the basis of the embodiment, a reinforcement learning mechanism is utilized, and the maximization of a return function and the minimization of a loss function are solved, so that the network parameters of a strategy network and an evaluation network and the updating of a selection action are realized, the reliability of selecting target data is improved, reliable data and labels are provided for the training of the image classification network, the classification precision of the image classification network is ensured, the training of the image classification network can be realized by adopting a small amount of data, and the training efficiency is improved.

Example IV

Fig. 7 is a schematic structural diagram of a training device for an image classification network according to a fourth embodiment of the present invention. As shown in fig. 7, the training device for an image classification network provided in this embodiment includes:

a network generation module 410 for generating an image classification network based on the labeled sample set;

the data selecting module 420 is configured to select target data from unlabeled data for labeling based on the policy network and the evaluation network;

a first updating module 430, configured to update the labeled sample set according to a labeling result of the target data, and update the image classification network according to the labeled sample set after updating;

A second updating module 440, configured to update the policy network and the evaluation network according to the labeling result;

and the loop execution module 450 is configured to loop the selecting operation and the updating operation until a preset condition is met, thereby obtaining the trained image classification network.

According to the training device for the image classification network, provided by the embodiment of the invention, the target data is selected through the reinforcement learning mechanism based on the strategy network and the evaluation network, the updating of the classifier is realized by utilizing the target data, and the strategy network and the evaluation network are updated according to the buffer data so as to ensure that the target data favorable for network training is selected, thereby realizing the training of the image classification network according to a small number of samples, improving the training efficiency and ensuring the classification precision.

Based on the above embodiment, the data selection module 420 is specifically configured to:

sorting the unlabeled data according to the selected probability values from large to small based on the action vector a of the strategy network, and selecting the data with the highest probability value and the set quantity as the target data for labeling; wherein, the action vector a is determined according to the evaluation result of the evaluation network.

Based on the above embodiment, the second updating module 440 includes:

a buffer data generation unit configured to generate buffer data;

an updating unit for updating the policy network and the evaluation network according to the buffered data;

wherein the buffered data comprises:

the image classification network before updating is corresponding to a first state quantity S of the predicted value of the test data;

the updated image classification network corresponds to a second state quantity S' corresponding to the predicted value of the test data;

the strategy network selects an action vector a of target data from unlabeled data;

the reward function r of the action vector a of the policy network in the first state quantity S.

On the basis of the above embodiment, the buffer data generating unit is specifically configured to:

and calculating an error between a predicted value of the image classification network before updating for the target data and a labeling result of the target data based on the evaluation network to obtain a corresponding reward function r.

On the basis of the embodiment, the value of the reward function r is inversely related to the classification accuracy of the target data;

the value of the reward function r is positively correlated with the reliability of the action vector a of the policy network.

On the basis of the above embodiment, the updating unit is specifically configured to:

and evaluating the action vector a of the strategy network based on the evaluation network, and updating the action vector a according to an evaluation result.

calculating a return function corresponding to the strategy network and the evaluation network according to the buffer data, wherein the return function is a Q value function;

calculating a loss function corresponding to the strategy network and the evaluation network according to the buffer data;

respectively adjusting network parameters theta of the strategy network by taking maximization of the return function and minimization of the loss function as optimization targets _a And a network parameter θ of the evaluation network _c 。

Based on the above embodiments, the network parameters θ of the policy network are adjusted with the maximization of the return function and the minimization of the loss function as optimization targets _a And a network parameter θ of the evaluation network _c The method specifically comprises the following steps:

network parameters θ according to the policy network _a And updating network parameter θ of the target policy network with the first update factor _a ’；

According to the network parameters theta of the evaluation network _c And updating the network parameter θ of the target evaluation network with the second update factor _c ’；

Recalculating a return function and a loss function corresponding to the target policy network and the target evaluation network;

repeatedly executing the updating operation and the calculating operation of the network parameters until the network parameters theta of the target strategy network are met when the return function is maximum and the loss function is minimum _a ' and network parameters θ of the target evaluation network _c ' as network parameters of the adjusted policy network and as network parameters of the evaluation network, respectively.

On the basis of the above embodiment, the preset conditions include:

and the number of the samples in the marked sample set reaches a set threshold.

The training device of the image classification network provided by the fourth embodiment of the invention can be used for executing the training method of the image classification network provided by any embodiment, and has corresponding functions and beneficial effects.

Example five

Fig. 8 is a schematic hardware structure of a device according to a fifth embodiment of the present invention. Devices include, but are not limited to: desktop computer, notebook computer, smart mobile phone and intelligent terminal such as tablet computer. As shown in fig. 8, an apparatus provided in this embodiment includes: a processor 510 and a storage device 520. The processor 510 and the memory device 520 in the apparatus may be one or more processors, for example, one processor 510 in fig. 8, and may be connected by a bus or other means, for example, a bus connection in fig. 8.

The one or more programs are executed by the one or more processors 510 to cause the one or more processors to implement the training method of the image classification network as described in any of the above embodiments.

The storage 520 in the apparatus is used as a computer readable storage medium, and may be used to store one or more programs, such as a software program, a computer executable program, and a module, such as program instructions/modules corresponding to a training method of an image classification network in an embodiment of the present invention (for example, a module in a training apparatus of an image classification network shown in fig. 7, including an initial training module 410, a data selecting module 420, a first updating module 430, and a second updating module 440). The processor 510 executes various functional applications of the apparatus and data processing, i.e., implements the training method of the image classification network in the above-described method embodiment, by running software programs, instructions, and modules stored in the storage 520.

The storage device 520 mainly includes a storage program area and a storage data area, wherein the storage program area can store an operating system, at least one application program required by functions; the storage data area may store data created from the use of the device, etc. (e.g., annotated sample sets, buffered data, etc. in the above embodiments). In addition, storage 520 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some examples, storage 520 may further include memory located remotely from processor 510, which may be connected to the device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

And, when one or more programs included in the above-described apparatus are executed by the one or more processors 510, the following operations are performed: generating an image classification network based on the labeled sample set; selecting target data from unlabeled data based on a strategy network and an evaluation network for labeling; updating the marked sample set according to the marking result of the target data, and updating the image classification network according to the updated marked sample set; updating the strategy network and the evaluation network according to the labeling result; and circularly executing the selecting operation and the updating operation until the preset condition is met, so as to obtain the trained image classification network.

The apparatus according to the present embodiment belongs to the same inventive concept as the training method of the image classification network according to the above embodiment, and technical details not described in detail in the present embodiment can be seen in any of the above embodiments, and the present embodiment has the same advantages as those of executing the training method of the image classification network.

On the basis of the above-described embodiments, this embodiment also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a training apparatus of an image classification network, implements the training method of the image classification network in any of the above-described embodiments of the present invention, the method comprising: generating an image classification network based on the labeled sample set; selecting target data from unlabeled data based on a strategy network and an evaluation network for labeling; updating the marked sample set according to the marking result of the target data, and updating the image classification network according to the updated marked sample set; updating the strategy network and the evaluation network according to the labeling result; and circularly executing the selecting operation and the updating operation until the preset condition is met, so as to obtain the trained image classification network.

Of course, the storage medium containing the computer executable instructions provided by the embodiment of the invention is not limited to the operation of the training method of the image classification network, but can also execute the related operation in the training method of the image classification network provided by any embodiment of the invention, and has corresponding functions and beneficial effects.

From the above description of embodiments, it will be clear to a person skilled in the art that the present invention may be implemented by means of software and necessary general purpose hardware, but of course also by means of hardware, although in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, etc., and include several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the training method of the image classification network according to the embodiments of the present invention.

Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims

1. A method of training an image classification network, comprising:

generating an image classification network based on the labeled sample set;

circularly executing the selecting operation and the updating operation until the preset condition is met, so as to obtain a trained image classification network;

The selecting target data from unlabeled data for labeling based on the strategy network and the evaluation network comprises the following steps:

sorting the unlabeled data according to the selected probability values from large to small based on the action vector a of the strategy network, and selecting the data with the highest probability value and the set quantity as the target data for labeling; wherein, the action vector a is determined according to the evaluation result of the evaluation network;

the updating the policy network and the evaluation network according to the labeling result comprises the following steps:

generating buffer data according to the labeling result, and updating the strategy network and the evaluation network according to the buffer data;

wherein the buffered data comprises:

2. The method of claim 1, wherein generating the buffered data from the labeling results comprises:

3. The method of claim 1, wherein the value of the reward function r is inversely related to the classification accuracy of the target data;

4. The method of claim 1, wherein the updating the policy network and the evaluation network based on the buffered data comprises:

5. The method of claim 1, wherein the updating the policy network and the evaluation network based on the buffered data comprises:

6. The method according to claim 5Wherein the network parameters θ of the policy network are adjusted with the objective of maximizing the return function and minimizing the loss function _a And a network parameter θ of the evaluation network _c Comprising:

7. The method according to any one of claims 1-6, wherein the preset conditions comprise:

and the number of the samples in the marked sample set reaches a set threshold.

8. A training device for an image classification network, comprising:

the network generation module is used for generating an image classification network based on the marked sample set;

the circulation execution module is used for circularly executing the selection operation and the updating operation until the preset conditions are met, so as to obtain a trained image classification network;

the data selecting module is specifically configured to:

The second updating module includes:

a buffer data generation unit configured to generate buffer data;

wherein the buffered data comprises:

9. An apparatus, comprising:

one or more processors;

a storage means for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the training method of the image classification network of any of claims 1-7.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements a training method of an image classification network according to any of claims 1-7.