CN116051935A

CN116051935A - Image detection method, training method and device of deep learning model

Info

Publication number: CN116051935A
Application number: CN202310200213.8A
Authority: CN
Inventors: 周文硕; 杨大陆; 杨叶辉; 代小亚; 王磊; 黄海峰
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2023-03-03
Filing date: 2023-03-03
Publication date: 2023-05-02
Anticipated expiration: 2043-03-03
Also published as: CN116051935B

Abstract

The disclosure provides an image detection method, a training device, training equipment, a storage medium and a program product of a deep learning model, and relates to the technical field of data processing, in particular to the technical fields of artificial intelligence, deep learning, image processing and AI medical treatment. The specific implementation scheme is as follows: acquiring an image to be processed; and inputting the image to be processed into a target deep learning model to obtain a detection result of the image to be processed.

Description

Image detection method, training method and device of deep learning model

Technical Field

The present disclosure relates to the field of data processing technologies, and in particular, to the fields of artificial intelligence, deep learning, image processing, and AI medical technologies, and in particular, to an image detection method, a training method of a deep learning model, a device, equipment, a storage medium, and a program product.

Background

The deep learning is a machine learning method applied to the technical field of artificial intelligence, a deep learning model obtained by the deep learning method can be applied to various scenes such as image processing, and how to improve the performance of the deep learning model becomes a technical problem to be solved.

Disclosure of Invention

The present disclosure provides an image detection method, a training method of a deep learning model, an apparatus, a device, a storage medium, and a program product.

According to an aspect of the present disclosure, there is provided an image detection method including: acquiring an image to be processed; inputting the image to be processed into a target deep learning model to obtain a detection result of the image to be processed, wherein the target deep learning model is obtained by training through the following operations: processing the target sample according to each target model branch of the initial deep learning model to obtain an output result; determining a first feedback value for each first network layer according to the output result and the target loss function; determining a second feedback value of a second network layer corresponding to the first network layer position according to the network parameter of each first network layer, wherein the target model branch comprises at least one first network layer and at least one second network layer; and respectively updating the network parameters of the first network layer and the second network layer according to the first feedback value and the second feedback value to obtain a target deep learning model.

According to another aspect of the present disclosure, there is provided a training method of a deep learning model, including: processing the target sample according to each target model branch of the initial deep learning model to obtain an output result; determining a first feedback value for each first network layer according to the output result and the target loss function; determining a second feedback value of a second network layer corresponding to the first network layer position according to the network parameter of each first network layer, wherein the target model branch comprises at least one first network layer and at least one second network layer; and respectively updating the network parameters of the first network layer and the second network layer according to the first feedback value and the second feedback value to obtain a target deep learning model.

According to another aspect of the present disclosure, there is provided an image detection apparatus including: the image acquisition module to be processed is used for acquiring the image to be processed; the detection result determining module is used for inputting the image to be processed into the target deep learning model to obtain the detection result of the image to be processed, wherein the target deep learning model is trained by the following modules: the output result determining module is used for respectively processing the target samples according to each target model branch of the initial deep learning model to obtain an output result; the first feedback value determining module is used for determining a first feedback value for each first network layer according to the output result and the target loss function; a second feedback value determining module, configured to determine a second feedback value of a second network layer corresponding to the first network layer location according to the network parameter of each first network layer, where the target model branch includes at least one first network layer and at least one second network layer; and the target deep learning model determining module is used for respectively updating the network parameters of the first network layer and the second network layer according to the first feedback value and the second feedback value to obtain a target deep learning model.

According to another aspect of the present disclosure, there is provided a training apparatus of a deep learning model, including: the output result determining module is used for respectively processing the target samples according to each target model branch in a plurality of model branches of the initial deep learning model to obtain an output result; the first feedback value determining module is used for determining a first feedback value for each first network layer according to the output result and the target loss function; a second feedback value determining module, configured to determine a second feedback value of a second network layer corresponding to the first network layer location according to the network parameter of each first network layer, where the target model branch includes at least one first network layer and at least one second network layer; and the target deep learning model determining module is used for respectively updating the network parameters of the first network layer and the second network layer according to the first feedback value and the second feedback value to obtain a target deep learning model.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor and a memory communicatively coupled to the at least one processor. Wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the methods of the embodiments of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method of the embodiments of the present disclosure.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program stored on at least one of a readable storage medium and an electronic device, the computer program when executed by a processor implementing a method of an embodiment of the present disclosure.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 schematically illustrates a system architecture diagram of an image detection method, a training method of a deep learning model, and an apparatus according to an embodiment of the present disclosure;

FIG. 2 schematically illustrates a flow chart of a training method of a deep learning model according to an embodiment of the present disclosure;

FIG. 3A schematically illustrates a model structure schematic of an initial deep learning model according to another embodiment of the present disclosure;

FIG. 3B schematically illustrates a schematic diagram of a backbone network of the initial deep learning model shown in FIG. 3A;

fig. 4 schematically illustrates a schematic diagram of an image processing method according to an embodiment of the present disclosure;

fig. 5 schematically shows a block diagram of an image processing apparatus according to an embodiment of the present disclosure;

FIG. 6 schematically illustrates a block diagram of a training apparatus of a deep learning model according to an embodiment of the present disclosure; and

fig. 7 schematically illustrates a block diagram of an electronic device in which the image detection method, the training method of the deep learning model of the embodiments of the present disclosure may be implemented.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.

Where expressions like at least one of "A, B and C, etc. are used, the expressions should generally be interpreted in accordance with the meaning as commonly understood by those skilled in the art (e.g.," a system having at least one of A, B and C "shall include, but not be limited to, a system having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

For deep learning, a large number of high-quality labeling samples are often required to train a relatively robust deep learning model. For natural data sets, data acquisition is easy and data labeling costs are relatively low. However, for data such as medical image, the data labeling process is complex, and often requires multiple field experts to label together. Therefore, in the medical image field, for example, the cost of a high-quality labeling sample is high, the existing data set often has insufficient labeling samples, so that the generalization performance of a medical image detection model based on deep learning is poor, the generalization capability of the deep learning model among different devices is not ensured, and the risk of suddenly falling performance indexes exists. In addition, in the field of medical images and the like, a large number of unlabeled samples exist, and a semi-supervised learning technology for improving model performance by utilizing the unlabeled samples becomes a hot spot in complex data processing scenes such as medical image detection analysis and the like.

For the deep learning technology, the components according to the labeling sample are as follows: supervised learning with a full population of labeled samples, unsupervised learning without labeled samples, and semi-supervised learning with labeled partial data. Here, a labeled sample may be understood as sample data with a label, and an unlabeled sample may be understood as sample data without a label.

The pi model is used as a deep learning model for semi-supervised learning, two noisy data enhancement are constructed, the two noisy data enhancement are respectively input into the same backbone network, cross entropy loss and consistency loss of the same sample output distribution are calculated, and the two losses are weighted for learning. The cross entropy loss realizes the classification capability by calculating the difference between the output distribution of the marked sample and the distribution of the label. The consistency loss generally calculates the difference of the output distribution of different enhancement data among the non-marked samples through the backbone network, and can also calculate the difference of the output distribution of the marked data under different enhancement strategies at the same time, so as to realize the learning of the non-marked samples by reducing the auxiliary task of the output difference.

Temporal Ensembling as another deep learning model of semi-supervised learning, a time sequence integration strategy is adopted, marked samples are still learned by calculating cross entropy loss, consistency loss between historical output distribution and output distribution of the current training steps is calculated by non-marked samples, and the consistency loss is weighted with the cross entropy loss as final loss. Compared with pi model, temporal Ensembling is a strategy of space time exchange, and additional space is required to be opened up in the training process to store the output distribution of the history samples, so that the training time is shortened. Features of unlabeled samples are learned by consistent comparison as with the pi model.

Fig. 1 schematically illustrates a system architecture of an image processing method, a training method of a deep learning model, and an apparatus according to an embodiment of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which embodiments of the present disclosure may be applied to assist those skilled in the art in understanding the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other devices, systems, environments, or scenarios.

As shown in fig. 1, a system architecture 100 in an embodiment of the present disclosure may include: a terminal 101 for acquiring a target sample, a terminal 102 for training a deep learning model, and a terminal 103 for image processing.

In embodiments of the present disclosure, the terminal 101 may be configured to obtain target samples for training an initial deep learning model. The terminal 102 may execute a training method of a corresponding deep learning model according to the target sample obtained by the terminal 101 to implement model training of the initial deep learning model, so as to obtain a target deep learning model. The terminal 103 may process the image to be processed based on the target deep learning model obtained by the terminal 102, to obtain a detection result of the image to be processed.

It should be noted that, the training of the image processing and the deep learning model may be implemented on the same terminal, or may be implemented on different terminals.

Terminals

101, 102 and 103 may be servers or a server cluster.

It should be understood that the number of

terminals

101, 102, and 103 in fig. 1 is merely illustrative. There may be any number of

terminals

101, 102, and 103, as desired for implementation.

It should be noted that, in the technical solution of the present disclosure, the processes of collecting, storing, using, processing, transmitting, providing, disclosing, etc. related personal information of the user all conform to the rules of the related laws and regulations, and do not violate the public welfare.

In the technical scheme of the disclosure, the authorization or consent of the user is obtained before the personal information of the user is obtained or acquired.

The embodiment of the present disclosure provides a training method of a deep learning model, and the training method of the deep learning model according to an exemplary embodiment of the present disclosure is described below with reference to fig. 2 to 3B in conjunction with the system architecture of fig. 1. The training method of the deep learning model of the embodiment of the present disclosure may be performed by the terminal 102 shown in fig. 1, for example.

Fig. 2 schematically illustrates a flowchart of a training method of a deep learning model according to an embodiment of the present disclosure.

As shown in fig. 2, the training method 200 of the deep learning model of the embodiment of the present disclosure may include, for example, operations S210 to S240.

In operation S210, the target samples are respectively processed according to each of a plurality of model branches of the initial deep learning model, so as to obtain an output result.

Model branching may be understood as a branching structure of an initial deep learning model that characterizes the initial deep learning model as having at least two branches in parallel.

Illustratively, the target sample may be an image sample, a text sample, or the like.

In operation S220, a first feedback value for each first network layer is determined according to the output result and the objective loss function.

The object model branch includes at least one first network layer. The target model branch is at least one of a plurality of model branches of the initial deep learning model, so any one model branch includes at least one first network layer. The network layer may be understood as a neural network layer, each of which may include a plurality of neurons. The neural network layer may include, for example, a convolutional layer or the like.

In operation S230, a second feedback value of the second network layer corresponding to the first network layer location is determined according to the network parameter of each first network layer.

The object model branch includes at least one second network layer. Any one of the model branches also includes at least one second network layer.

The structure of each model branch may be the same, for example, the initial deep learning model includes 2 model branches branch l and branch2, each model branch includes X network layers, and in the case of model branch bar as the target model branch, the X network layers include X1 first network layers and X2 second network layers, which are ordered as R1 among the X network layers of the target model branch for any one first network layer, "the second network layer corresponding to the first network layer position" may be understood as the second network layer ordered as R1 among the X network layers of the model branch 2.

In operation S240, network parameters of the first network layer and the second network layer are updated according to the first feedback value and the second feedback value, respectively, to obtain a target deep learning model.

According to the training method of the deep learning model in the embodiment of the disclosure, the network parameters of the first network layer are updated according to the first feedback values, the first feedback values are determined according to the target loss function and the output result obtained by inputting the target sample into each target model branch, the processing of the target sample into each target model branch can be understood as a forward propagation process, and the updating of the network parameters of the first network layer by the forward propagation result (output result) and the first feedback values determined by the target loss function can be understood as a backward propagation process, so that the first network layer is a "leachable" network layer and is suitable for training the model by using the labeling sample. The network parameters of the second network layer corresponding to the first network layer position are updated according to the second feedback values, the second feedback values are determined according to the corresponding network parameters of the first network layer, and the method is suitable for training a model by using a non-labeling sample, so that the training method of the deep learning model can be used for model training by using a semi-supervised learning mode, and is particularly suitable for scenes with complex sample labeling and high sample labeling cost, such as medical image processing.

Compared with the training method of the deep learning model, according to the embodiment of the disclosure, since each target model branch of the initial deep learning model includes at least one first network layer with a "learnable" network parameter and at least one second network layer depending on the first network layer at a corresponding position, a part of the network parameters of each target model branch can be obtained through back propagation learning, and the other part of the network parameters can be obtained through model parameters of the network layer at the corresponding position, so that the network parameters of the second network layer determined depending on the first network layer have better robustness, and after the second network layer with better robustness is "cross-mixed" with the first network layer with the back propagation learning function, the training of the first network layer as the "learning layer" is more facilitated, so that the generalization of the target deep learning model obtained through training is better, the model performance is better, and the training method is particularly suitable for scenes such as medical image processing.

Fig. 3A schematically illustrates a model structure diagram of an initial deep learning model according to another embodiment of the present disclosure. Fig. 3B schematically shows a schematic diagram of a backbone network of the initial deep learning model as shown in fig. 3A.

As shown in fig. 3A, according to a training method of a deep learning model according to another embodiment of the present disclosure, a target model branch may include a teacher model branch T and a student model branch S. In the example of fig. 3A, a teacher model branch T is characterized by a backbone network backbone T of a teacher model branch, and a student model branch S is characterized by a backbone network backbone S of a student model branch.

In the example of fig. 3B, any two network layers of the teacher model branch backup T are adjacent first and second network layers, so that the first and second network layers of the teacher model branch are disposed to intersect. The first network layer and the second network layer of the student model branch backup S are matched with the teacher model branch to be arranged in a crossing mode.

Illustratively, any two network layers of the student model branching back bone S may be adjacent first and second network layers, so that the first and second network layers of the student model branching are disposed in a crossing manner. The first network layer and the second network layer of the teacher model branch backup T are matched with the student model branches in a crossing mode.

For example, as shown in fig. 3B, the teacher model branch backup T includes a plurality of network layers such as a network layer L1T and a network layer L2T that are sequentially connected in series, for example, the network layer L1T is ordered to be 1 among the plurality of network layers included in the teacher model branch, the network layer L2T is ordered to be 2 among the plurality of network layers included in the teacher model branch, the network layer L1T and the network layer L3T are both first network layers, and the network layer L2T and the network layer FC T are both second network layers. The network layer FC T may be a fully connected network layer. Similarly, the network layer L2S and the network layer FC S of the student model branching back bone S are both the first network layer, and the network layer L1S and the network layer L3S are both the second network layer.

According to the training method of the deep learning model, any two network layers of the teacher model branch are adjacent first network layers and second network layers, so that the first network layers and the second network layers of the teacher model branch are arranged in a crossing mode, and correspondingly, the first network layers and the second network layers of the student model branch are matched with the teacher model branch in a crossing mode. For a teacher model branch or a student model branch, the first network layer and the second network layer are all arranged in a crossing mode, so that the first network layer can serve as a 'learnable' network layer and the second network layer are more fully 'cross-fused' on the teacher model branch or the student model branch, training of the first network layer is facilitated, and the generalization of a target deep learning model obtained through training is better and the performance is better.

Illustratively, the structure of the teacher model branch may be further configured such that v second network layers are set every u first network layers, u and v being integers of 1 or more. Accordingly, the structure of the student model branches may be set such that v first network layers are set every u second network layers. Each object model branch may thereby be made to comprise at least one first network layer and at least one second network layer.

Illustratively, the structure of the teacher model branch may also be configured such that u first network layers are set at any position, and v second network layers are set at the rest positions. Each object model branch may thus also be made to comprise at least one first network layer and at least one second network layer.

Illustratively, according to a training method of the deep learning model of a further embodiment of the present disclosure, the target sample includes an annotation sample. The output results comprise a first output result of the labeling sample of the teacher model branch and a second output result of the labeling sample of the student model branch, and the target loss function comprises a first loss function and a second loss function. A specific example of determining the first feedback value for each first network layer from the output result and the target loss function may be implemented, for example, using the following embodiments: and determining the branch feedback value of the teacher model according to the first output result of the labeling sample and the first loss function. And determining the branch feedback value of the student model according to the second output result of the labeling sample and the second loss function. And determining a first feedback value according to the teacher model branch feedback value and the student model branch feedback value.

According to the training method of the deep learning model, since the labeling sample is provided with the label, the prediction result of the current teacher model branch for the labeling sample can be represented by the first output result of the labeling sample, the degree of difference between the prediction result of the current teacher model branch for the labeling sample and the label (real result) can be estimated by the first loss function, so that the processing performance of the current teacher model branch for the labeling sample can be estimated, similarly, the prediction result of the current student model branch for the labeling sample can be represented by the second output result of the labeling sample, the degree of difference between the prediction result of the current student model branch for the labeling sample and the label (real result) can be estimated by the second loss function, and the performance of the current student model branch for the labeling sample can be estimated. The training method of the deep learning model according to the embodiment of the disclosure can integrate the teacher model branch and the student model branch by the first feedback value determined according to the teacher model branch feedback value and the student model branch feedback value, and determine the first feedback value for feeding back and adjusting the network parameter of the first network layer from the overall prediction result of the initial deep learning model on the labeling sample.

For example, the first loss function and the second loss function may be the same loss function.

Illustratively, as shown in fig. 3A. The first and second loss functions may be, for example, cross Entropy loss functions.

Illustratively, according to the training method of the deep learning model of the further embodiment of the disclosure, the target sample further comprises an unlabeled sample, and the output result further comprises an unlabeled sample first output result of the teacher model branch and an unlabeled sample second output result of the student model branch; the target loss function also includes a third loss function. Determining a specific example of the first feedback value from the teacher model branch feedback value and the student model branch feedback value may be implemented, for example, using the following embodiments: determining a feedback value of the non-marked sample according to the first output result of the non-marked sample, the second output result of the non-marked sample and the third loss function; and determining a first feedback value according to the teacher model branch feedback value, the student model branch feedback value and the unlabeled sample feedback value.

According to the training method of the deep learning model, since the non-labeling sample does not have a label, the prediction result of the current teacher model branch for the non-labeling sample can be represented by the non-labeling sample first output result, similarly, the prediction result of the current student model branch for the non-labeling sample can be represented by the non-labeling sample second output result, and the degree of difference between the prediction result of the current teacher model branch for the non-labeling sample and the prediction result of the current student model branch for the non-labeling sample can be estimated by the third loss function, for example, so that the processing performance of the initial deep learning model for the non-labeling sample is estimated, and the determined feedback value of the non-labeling sample is used as a basis for adjusting the network parameters of the second network layer according to the non-labeling sample first output result, the non-labeling sample second output result and the third loss function. According to the training method of the deep learning model, according to the teacher model branch feedback value, the student model branch feedback value and the unlabeled sample feedback value, the determined first feedback value can integrate the teacher model branch and the student model branch, and the first feedback value for feeding back and adjusting the network parameters of the first network layer is determined from the overall prediction results of the initial deep learning model on all types of samples.

The third loss function may, for example, comprise a consistency loss function.

Illustratively, as shown in fig. 3A. The third loss function may be, for example, the Mean-square error loss Mean-square error, abbreviated MSE.

In summary, for example, the first loss function may be represented by the following formula (1), the second loss function may be represented by the following formula (2), the third loss function may be represented by the following formula (3), and for example, the target loss function of the training method of the deep learning model of the embodiment of the disclosure may be represented by the following formula (4), where the target loss function is determined according to the first loss function, the second loss function, and the third loss function described above. The target loss function may be derived, for example, from the weighting of the first loss function, the second loss function, and the third loss function.

L(D _L ，D _U ，H_s，H_t)＝αL _sup (D _L ，H _-s )+βL _sup (D _L ，H _{_t} )+(1-α-β)L _con (D _U ，H_s，H_t) (4)

Omega characterizes the set of image grid locations (set of image pixel locations), l _ce (. Cndot.) characterization of Cross entropy loss, L _con (. Cndot.) characterization of consistency loss, l _mse (. Cndot.) characterizes the mean square error loss.

Illustratively, l _ce (y(ω)，p _{H_t} (x)(ω))＝-∑ _i∈C y(ω) _i log(p _{H_t} (x)(ω) _i )。

By way of example only, and in an illustrative,

c characterizes the number of label categories for the sample. H_t represents the task header of the teacher model branch for a particular model task, and h_s represents the task header of the student model branch for a particular model task. For example, taking the initial deep learning model for a bi-classifier as an example, H_t may be, for example, a classifier, and H_s may be, for example, a classifier.

Illustratively, according to the training method of the deep learning model of the further embodiment of the present disclosure, for example, the following embodiment may be used to implement a specific example of determining the second feedback value of the second network layer corresponding to the first network layer position according to the network parameter of each first network layer: and determining the current updating times of the model parameters corresponding to the target samples. According to the current update times of the model parameters, determining historical network parameters before the current update times of the model parameters corresponding to the first network layer and current network parameters of the current update times of the model parameters corresponding to the first network layer, and determining an index moving average value of the first network layer. And determining a second feedback value of the second network layer corresponding to the first network layer position according to the index moving average value.

The current update time may be understood as the current training step number, for example, may be the current epoch or the current batch (batch, i.e. training batch) corresponding to the target sample, where epoch may be understood as a process of performing one operation processing on a complete data set through the deep learning network.

The exponential moving average, exponential Moving Average, abbreviated EMA. Fig. 3B schematically illustrates an example of determining a second feedback value of a second network layer corresponding to a first network layer location by an exponentially moving average of the first network layer.

According to the training method of the deep learning model, the second feedback value of the second network layer corresponding to the position of the first network layer is determined through the index moving average value of the first network layer, and the weight parameter of the EMA is equivalent to the weighted average of gradients in the training process of the model (for example, the gradient weight of the initial training is very small), the index moving average value endows relatively smaller weight to the gradient in the initial training stage, the training method is suitable for the characteristic of training stability in the training process of the deep learning model, and the obtained second feedback value is more accurate.

Fig. 3A and 3B schematically illustrate specific examples of determining a second feedback value of a second network layer corresponding to a first network layer position by an exponential moving average EMA of the first network layer.

Illustratively, according to a training method of a deep learning model of a further embodiment of the present disclosure, at least one of the first network layer and the second network layer includes a convolution layer and a batch normalization layer, and network parameters of the batch normalization layer include weights, offsets, means, and variances.

For example, the first network layer and the second network layer may each include a convolution layer and a batch normalization layer connected in sequence.

For example, the first network layer and the second network layer may each include a convolution layer, a batch normalization layer, and an activation layer connected in sequence.

Illustratively, the network parameters of the convolutional layer may include weights and offsets.

The batch normalization layer, batch Normalization, abbreviated BN.

According to the training method of the deep learning model, for each first network layer or each second network layer, mean value operation, variance operation and standardization operation can be carried out on training samples of the current batch through the batch normalization layer, the batch normalization layer can enable input data distribution of each first network layer and each second network layer to be relatively stable, and learning speed of the deep learning model is accelerated; in addition, the parameter adjusting process can be simplified, so that the learning of the deep learning model is more stable; furthermore, the gradient vanishing problem can be relieved with the activation layer; the batch normalization layer also has a certain regularization effect, so that the generalization of the depth model is better.

It should be noted that the convolution layer and the batch normalization layer (or the convolution layer, the batch normalization layer, and the activation layer) included in each of the first network layer or the second network layer may be understood as one program module. For each first network layer or second network layer, the network parameters of the respective convolutional layer and batch normalization layer (or convolutional layer, batch normalization layer, and activation layer) may be updated synchronously.

Illustratively, the training method of the deep learning model according to the further embodiment of the present disclosure further includes: and carrying out data enhancement on the initial sample to obtain a target sample.

For example, as shown in fig. 3A, for example, two different data enhancement may be performed on the initial Sample, so as to obtain two different data enhancement samples Aug1 and Aug2, and for each model branch, the data enhancement Sample input to the model branch is the target Sample. Two different data enhancement samples are input into the teacher model branch and the student model branch, respectively.

Illustratively, the initial sample may be subjected to a data enhancement strategy within 30 degrees of random horizontal flip, random vertical flip, and random rotation, for example, to obtain a data enhanced sample.

According to the training method of the deep learning model, the number of samples can be increased by carrying out data enhancement on the initial samples, dependence of the model on training samples is reduced, the deep learning model with better performance can be obtained conveniently through training, the target sample aiming at any target model branch can be obtained by carrying out different data enhancement on the initial samples, and generalization of the target deep learning model can be improved.

Illustratively, according to a training method of the deep learning model of a further embodiment of the present disclosure, the backbone network of the initial deep learning model comprises a residual network.

The Residual Network is called as ResNet for short, and has an ultra-deep Network structure, so that deep features can be extracted, and the Residual structure is provided, so that the degradation problem can be solved, the model performance can be improved, and the method is suitable for complex image processing scenes such as medical image and the like.

Illustratively, the backbone network of the initial deep learning model may further comprise: efficientNet. Compared with a common convolutional neural network, the EfficientNet can obtain richer and more complex characteristics by amplifying the number of convolutional layers, and the characterization capability of the model is improved; the feature with finer granularity can be obtained by amplifying the number of convolution channels, so that the training difficulty of the deep learning model is reduced; and a feature template with higher fine granularity can be obtained by amplifying the resolution of input data, so that the method is suitable for complex image processing scenes such as medical image.

Taking a scene of the following target deep learning model for medical image processing as an example, taking a medical image as a knee joint X-ray image as an example for a medical image, the knee joint X-ray image may include 5 label categories, which are respectively: category 0 (healthy); category 1 (suspected of arthritis); category 2 (mild arthritis); category 3 (moderate arthritis); category 4 (severe arthritis).

Tables 1-3 schematically show the performance of the target deep learning model for medical image images.

Table 1: sample data distribution for knee joint X-ray images of 5 categories

Data	Train	Val	Test
				Num	5778	826	1656

Table 2: resNet 34-based algorithm index comparison

Table 3: algorithm index contrast based on EfficientNetB0

The total of 200 epochs were trained with an iterative training strategy that set the initial learning rate to 0.01 and employed the cosine decay learning rate and the first 300 warmup. Using a pre-trained model of ImageNet, the image size was set to 224 x 224, and Kappa was used as an example for the model performance evaluation index.

The target deep learning model trained by the deep learning model training method according to the embodiment of the present disclosure is Cross EMA Mean Teacher in the above tables 2 to 3. The comparison shows that: the performance of the target deep learning model Cross EMA Mean Teacher determined by embodiments of the present disclosure is improved by about 2.22% over ResNet34 and 0.25% over Mean Teacher. Also, table three is the experimental result of the EfficientNetB0 being a backbond, the method of the present application improved significantly by 1.82% compared to EfficientNetB0 and 0.76% compared to Mean Teacher.

In summary, the target deep learning model obtained by the training method of the deep learning model according to the embodiment of the present disclosure is configured to cross the first network layer and the second network layer of the teacher model branch and the student model branch (which is reflected in that each target model branch of the teacher model branch and the student model branch includes at least one first network layer and at least one second network layer), so that the network parameters of the second network layer determined by the first network layer have better robustness, and after the second network layer with better robustness is "cross-mixed" with the first network layer capable of back propagation learning, the first network layer is more conducive to training of the "learning layer", for example, the above advantages are verified on the multi-classification task of the knee joint X-ray image.

It should be further noted that, the target deep learning model obtained by the training method of the deep learning model according to the embodiment of the present disclosure may be used not only for image detection, but also for various scenes such as image classification, image segmentation, and the like.

The embodiment of the present disclosure further provides an image processing method, and an image processing method according to an exemplary embodiment of the present disclosure is described below with reference to fig. 4 in conjunction with the system architecture of fig. 1. The image processing method of the embodiment of the present disclosure may be performed by the terminal 103 shown in fig. 1, for example.

Fig. 4 schematically shows a flowchart of an image processing method according to an embodiment of the present disclosure.

As shown in fig. 4, the image processing method 400 of the embodiment of the present disclosure may include, for example, operations S410 to S420.

In operation S410, a to-be-processed image is acquired.

In operation S420, the image to be processed is input into the deep learning model, and a detection result of the image to be processed is obtained.

The deep learning model is trained using the following operations: and respectively processing the target samples according to each target model branch in the plurality of model branches of the initial deep learning model to obtain an output result. A first feedback value for each first network layer is determined based on the output result and the target loss function. And determining a second feedback value of a second network layer corresponding to the first network layer position according to the network parameter of each first network layer, wherein the target model branch comprises at least one first network layer and at least one second network layer. And respectively updating network parameters of the first network layer and the second network layer according to the first feedback value and the second feedback value to obtain a target deep learning model.

For example, the image to be processed may comprise a medical image.

According to the image processing method of the embodiment of the present disclosure, since the performance of the target deep learning model obtained by using the training method of the deep learning model of the embodiment is better and the generalization is better, the specific technical scheme and technical effect related to the training process of the target deep learning model are described in detail in the above embodiment, and are not described herein again.

According to the image processing method disclosed by the embodiment of the disclosure, the detection result obtained by processing the image to be processed by using the target deep learning model is more accurate, and the detection efficiency can be ensured by performing image processing on different devices by using the target deep learning model, so that the image processing method can be suitable for scenes such as complex image processing.

Fig. 5 schematically shows a block diagram of an image processing apparatus according to an embodiment of the present disclosure.

As shown in fig. 5, the image processing apparatus 500 of the embodiment of the present disclosure includes, for example, a to-be-processed image acquisition module 510, and a detection result determination module 520.

The image to be processed acquisition module 510 is configured to acquire an image to be processed.

The detection result determining module 520 is configured to input the image to be processed into the target deep learning model, and obtain a detection result of the image to be processed.

The target deep learning model is trained by the following modules: the output result determining module is used for respectively processing the target samples according to each target model branch of the initial deep learning model to obtain an output result; the first feedback value determining module is used for determining a first feedback value for each first network layer according to the output result and the target loss function; a second feedback value determining module, configured to determine a second feedback value of a second network layer corresponding to the first network layer location according to the network parameter of each first network layer, where the target model branch includes at least one first network layer and at least one second network layer; and the target deep learning model determining module is used for respectively updating the network parameters of the first network layer and the second network layer according to the first feedback value and the second feedback value to obtain a target deep learning model.

Illustratively, the image to be processed comprises a medical imaging image.

Fig. 6 schematically illustrates a block diagram of a training apparatus of a deep learning model according to an embodiment of the present disclosure.

As shown in fig. 6, the training apparatus 600 for a deep learning model according to an embodiment of the present disclosure includes, for example, an output result determining module 610, a first feedback value determining module 620, a second feedback value determining module 630, and a target deep learning model determining module 640.

The output result determining module 610 is configured to process the target samples according to each of a plurality of model branches of the initial deep learning model, to obtain an output result.

The first feedback value determining module 620 is configured to determine a first feedback value for each first network layer according to the output result and the target loss function.

A second feedback value determining module 630, configured to determine a second feedback value of a second network layer corresponding to the first network layer location according to the network parameter of each first network layer, where the target model branch includes at least one first network layer and at least one second network layer.

The target deep learning model determining module 640 is configured to update network parameters of the first network layer and the second network layer according to the first feedback value and the second feedback value, respectively, so as to obtain a target deep learning model.

Illustratively, the target model branches include a teacher model branch and a student model branch; any two network layers of the teacher model branch are adjacent first network layers and second network layers, so that the first network layers and the second network layers of the teacher model branch are arranged in a crossing manner; the first network layer and the second network layer of the student model branches are matched with the teacher model branches in a crossing mode.

Illustratively, the target sample comprises a labeling sample; the output results comprise a first output result of a labeling sample of the teacher model branch and a second output result of a labeling sample of the student model branch, and the target loss function comprises a first loss function and a second loss function; the first feedback value determining module includes: the teacher model branch feedback value determining submodule is used for determining a teacher model branch feedback value according to a first output result of the labeling sample and a first loss function; the student model branch feedback value determining submodule is used for determining a student model branch feedback value according to the second output result of the labeling sample and the second loss function; and the first feedback value determining submodule is used for determining the first feedback value according to the teacher model branch feedback value and the student model branch feedback value.

Illustratively, the target sample further comprises an unlabeled sample, and the output result further comprises an unlabeled sample first output result of the teacher model branch and an unlabeled sample second output result of the student model branch; the target loss function further includes a third loss function; the first feedback value determination submodule includes: the non-labeling sample feedback value determining unit is used for determining a non-labeling sample feedback value according to the first output result of the non-labeling sample, the second output result of the non-labeling sample and the third loss function; and the first feedback value determining unit is used for determining the first feedback value according to the teacher model branch feedback value, the student model branch feedback value and the unlabeled sample feedback value.

Illustratively, the second feedback value determination module includes: the current update times determining submodule is used for determining the current update times of the model parameters corresponding to the target sample; the index moving average value determining submodule is used for determining historical network parameters before the current updating times of the model parameters corresponding to the first network layer and current network parameters of the current updating times of the model parameters corresponding to the first network layer according to the current updating times of the model parameters and determining an index moving average value of the first network layer; and a second feedback value determining sub-module for determining a second feedback value of the second network layer corresponding to the first network layer location according to the exponential moving average.

Illustratively, at least one of the first network layer and the second network layer includes a convolution layer and a batch normalization layer, the network parameters of the batch normalization layer including weights, offsets, means, and variances.

Illustratively, the training apparatus of the deep learning model according to the embodiment of the present disclosure further includes: and the data enhancement module is used for carrying out data enhancement on the initial sample to obtain a target sample.

Illustratively, the backbone network of the initial deep learning model includes a residual network.

It should be understood that the embodiments of the apparatus portion of the present disclosure correspond to the same or similar embodiments of the method portion of the present disclosure, and the technical problems to be solved and the technical effects to be achieved also correspond to the same or similar embodiments, which are not described herein in detail.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 7 illustrates a schematic block diagram of an example electronic device 700 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 7, the apparatus 700 includes a computing unit 701 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the device 700 may also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Various components in device 700 are connected to I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, etc.; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, an optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The calculation unit 701 performs the respective methods and processes described above, such as an image processing method, a training method of a deep learning model. For example, in some embodiments, the image processing method, the training method of the deep learning model, may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 700 via ROM 702 and/or communication unit 709. When the computer program is loaded into the RAM 703 and executed by the computing unit 701, one or more steps of the image processing method, the training method of the deep learning model, and the like described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the image processing method, the training method of the deep learning model, by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. An image detection method, comprising:

acquiring an image to be processed;

inputting the image to be processed into a target deep learning model to obtain a detection result of the image to be processed,

the target deep learning model is trained by the following operations:

processing the target sample according to each target model branch of the initial deep learning model to obtain an output result;

determining a first feedback value for each first network layer according to the output result and the target loss function;

determining a second feedback value of a second network layer corresponding to the first network layer position according to the network parameter of each first network layer, wherein the target model branch comprises at least one first network layer and at least one second network layer; and

And respectively updating network parameters of the first network layer and the second network layer according to the first feedback value and the second feedback value to obtain a target deep learning model.

2. The method of claim 1, wherein the image to be processed comprises a medical imaging image.

3. A training method of a deep learning model, comprising:

processing the target sample according to each target model branch in a plurality of model branches of the initial deep learning model to obtain an output result;

4. A method according to claim 3, wherein the target model branches include a teacher model branch and a student model branch; any two network layers of the teacher model branch are adjacent to the first network layer and the second network layer, so that the first network layer and the second network layer of the teacher model branch are arranged in a crossing manner; the first network layer and the second network layer of the student model branches are matched with the teacher model branches in a crossing mode.

5. The method of claim 4, wherein the target sample comprises a labeling sample; the output results comprise a first output result of a labeling sample of the teacher model branch and a second output result of a labeling sample of the student model branch, and the target loss function comprises a first loss function and a second loss function; determining a first feedback value for each first network layer based on the output result and a target loss function comprises:

determining a teacher model branch feedback value according to the first output result of the labeling sample and the first loss function;

determining a branch feedback value of the student model according to the second output result of the labeling sample and the second loss function; and

and determining the first feedback value according to the teacher model branch feedback value and the student model branch feedback value.

6. The method of claim 5, wherein the target samples further comprise unlabeled samples, the output results further comprising unlabeled sample first output results of the teacher model branch and unlabeled sample second output results of the student model branch; the target loss function further includes a third loss function; the determining the first feedback value according to the teacher model branch feedback value and the student model branch feedback value includes:

Determining a feedback value of the non-marked sample according to the first output result of the non-marked sample, the second output result of the non-marked sample and the third loss function; and

and determining the first feedback value according to the teacher model branch feedback value, the student model branch feedback value and the non-labeling sample feedback value.

7. The method of any of claims 3-6, wherein the determining, from the network parameters of each first network layer, a second feedback value for a second network layer corresponding to the first network layer location comprises:

determining the current update times of the model parameters corresponding to the target samples;

according to the current update times of the model parameters, determining historical network parameters before the current update times of the model parameters corresponding to the first network layer and current network parameters corresponding to the current update times of the model parameters corresponding to the first network layer, and determining an index moving average value of the first network layer; and

and determining a second feedback value of a second network layer corresponding to the first network layer position according to the index moving average value.

8. The method of any of claims 3-6, wherein at least one of the first network layer, the second network layer comprises a convolution layer and a batch normalization layer, network parameters of the batch normalization layer comprising weights, offsets, means, and variances.

9. The method of any of claims 3-6, further comprising:

and carrying out data enhancement on the initial sample to obtain the target sample.

10. The method of any of claims 3-6, wherein the backbone network of the initial deep learning model comprises a residual network.

11. An image detection apparatus comprising:

the image acquisition module to be processed is used for acquiring the image to be processed;

a detection result determining module for inputting the image to be processed into a target deep learning model to obtain a detection result of the image to be processed,

the target deep learning model is trained by the following modules:

the output result determining module is used for respectively processing the target samples according to each target model branch of the initial deep learning model to obtain an output result;

the first feedback value determining module is used for determining a first feedback value for each first network layer according to the output result and the target loss function;

a second feedback value determining module, configured to determine a second feedback value of a second network layer corresponding to the first network layer location according to a network parameter of each first network layer, where the target model branch includes at least one first network layer and at least one second network layer; and

And the target deep learning model determining module is used for respectively updating the network parameters of the first network layer and the second network layer according to the first feedback value and the second feedback value to obtain a target deep learning model.

12. The apparatus of claim 11, wherein the image to be processed comprises a medical imaging image.

13. A training device for a deep learning model, comprising:

the output result determining module is used for respectively processing the target samples according to each target model branch in a plurality of model branches of the initial deep learning model to obtain an output result;

14. The apparatus of claim 13, wherein the target model branches comprise a teacher model branch and a student model branch; any two network layers of the teacher model branch are adjacent to the first network layer and the second network layer, so that the first network layer and the second network layer of the teacher model branch are arranged in a crossing manner; the first network layer and the second network layer of the student model branches are matched with the teacher model branches in a crossing mode.

15. The apparatus of claim 14, wherein the target sample comprises a labeling sample; the output results comprise a first output result of a labeling sample of the teacher model branch and a second output result of a labeling sample of the student model branch, and the target loss function comprises a first loss function and a second loss function; the first feedback value determining module includes:

the teacher model branch feedback value determining submodule is used for determining a teacher model branch feedback value according to the first output result of the labeling sample and the first loss function;

the student model branch feedback value determining submodule is used for determining a student model branch feedback value according to the second output result of the labeling sample and the second loss function; and

And the first feedback value determining submodule is used for determining the first feedback value according to the teacher model branch feedback value and the student model branch feedback value.

16. The apparatus of claim 15, wherein the target samples further comprise unlabeled samples, the output results further comprising unlabeled sample first output results of the teacher model branch and unlabeled sample second output results of the student model branch; the target loss function further includes a third loss function; the first feedback value determination submodule includes:

the non-labeling sample feedback value determining unit is used for determining a non-labeling sample feedback value according to the first output result of the non-labeling sample, the second output result of the non-labeling sample and the third loss function; and

and the first feedback value determining unit is used for determining the first feedback value according to the teacher model branch feedback value, the student model branch feedback value and the unlabeled sample feedback value.

17. The apparatus of any of claims 13-16, wherein the second feedback value determination module comprises:

The current update times determining submodule is used for determining the current update times of the model parameters corresponding to the target sample;

the index moving average value determining submodule is used for determining historical network parameters before the current update times of the model parameters corresponding to the first network layer and current network parameters of the model parameters corresponding to the current update times of the first network layer according to the current update times of the model parameters and determining an index moving average value of the first network layer; and

and the second feedback value determining submodule is used for determining a second feedback value of a second network layer corresponding to the first network layer position according to the index moving average value.

18. The apparatus of any of claims 13-16, wherein at least one of the first network layer, the second network layer comprises a convolutional layer and a batch normalization layer, network parameters of the batch normalization layer comprising weights, offsets, means, and variances.

19. The apparatus of any of claims 13-16, further comprising:

and the data enhancement module is used for carrying out data enhancement on the initial sample to obtain the target sample.

20. The apparatus of any of claims 13-16, wherein the backbone network of the initial deep learning model comprises a residual network.

21. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-2 or to perform the method of any one of claims 3-10.

22. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-2 or perform the method of any one of claims 3-10.

23. A computer program product comprising a computer program stored on at least one of a readable storage medium and an electronic device, which, when executed by a processor, implements the method of any one of claims 1-2 or performs the method of any one of claims 3-10.