CN111275104A

CN111275104A - Model training method and device, server and storage medium

Info

Publication number: CN111275104A
Application number: CN202010061758.1A
Authority: CN
Inventors: 范伟亚; 黄访
Original assignee: Chongqing Jinshan Medical Technology Research Institute Co Ltd
Current assignee: Chongqing Jinshan Science and Technology Group Co Ltd
Priority date: 2020-01-16
Filing date: 2020-01-16
Publication date: 2020-06-12

Abstract

The embodiment of the invention discloses a model training method, a model training device, a server and a storage medium, wherein the method comprises the following steps: obtaining an initial detection model, wherein the initial detection model is obtained by training according to a first sample set, and the first sample set comprises: regarding a sample detection image of a first detection part, the initial detection model is used for identifying the image characteristics of a target detection image acquired by detection of the first detection part by the detection equipment; obtaining a second sample set, the number of samples of the second sample set being smaller than the number of samples of the first sample set, the second sample set comprising: a sample detection image for the second detection site; and adjusting the model parameters of the initial detection model according to the second sample set to obtain a target detection model, wherein the target detection model is used for identifying the image characteristics of a target detection image acquired by the detection equipment for detecting the second detection part, so that the model training can be rapidly carried out, and a model with better performance can be obtained.

Description

Model training method and device, server and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a model training method and apparatus for image detection, a server, and a storage medium.

Background

With the continuous and deep development of computer technology, the conventional process of performing image recognition based on a machine learning method includes obtaining sample images, extracting features, constructing a classifier, and inputting the extracted features into the classifier to realize the class recognition, while the conventional machine learning method needs to rely on manual work to perform the feature extraction, and in order to obtain a better training model for training, the number of the sample images to be obtained is often very large, which is obvious that, when the current model training method is adopted to perform model training, the obtaining process of the sample images consumes a lot of time and cost, and if the number of the extracted features is small during the feature extraction, the model training may be insufficient, which may lead to the problems of poor performance and weak generalization of the trained model. Therefore, how to quickly train the model and obtain the model with better performance is a research hotspot in the current model training process.

Disclosure of Invention

The embodiment of the invention provides a model training method, a model training device, a server and a storage medium for image detection, which can be used for quickly training a model and obtaining the model with better performance.

In one aspect, an embodiment of the present invention provides a model training method for image detection, where the method includes:

obtaining an initial detection model, wherein the initial detection model is obtained by training according to a first sample set, and the first sample set comprises: regarding a sample detection image of a first detection part, the initial detection model is used for identifying the image characteristics of a target detection image acquired by detection equipment for detecting the first detection part;

obtaining a second set of samples, the second set of samples having a smaller number of samples than the first set of samples, the second set of samples comprising: a sample detection image for the second detection site;

and adjusting the model parameters of the initial detection model according to the second sample set to obtain a target detection model, wherein the target detection model is used for identifying the image characteristics of a target detection image acquired by the detection equipment for detecting the second detection part.

In another aspect, an embodiment of the present invention provides a model training apparatus for image detection, where the apparatus includes:

an obtaining unit, configured to obtain an initial detection model, where the initial detection model is obtained by training according to a first sample set, and the first sample set includes: regarding a sample detection image of a first detection part, the initial detection model is used for identifying the image characteristics of a target detection image acquired by detection equipment for detecting the first detection part;

the obtaining unit is further configured to obtain a second sample set, where a number of samples of the second sample set is smaller than a number of samples of the first sample set, and the second sample set includes: a sample detection image for the second detection site;

and the adjusting unit is used for adjusting the model parameters of the initial detection model according to the second sample set to obtain a target detection model, and the target detection model is used for identifying the image characteristics of a target detection image acquired by the detection equipment for detecting the second detection part.

In another aspect, an embodiment of the present invention provides a server, including a processor, a memory, and a communication interface, where the processor, the memory, and the communication interface are connected to each other, where the memory is used to store computer program instructions, and the processor is configured to execute the program instructions, and perform the following steps:

In yet another aspect, an embodiment of the present invention provides a computer-readable storage medium, which stores a computer program, where the computer program includes program instructions, and the program instructions, when executed by a processor, cause the processor to execute the method of the first aspect.

In the embodiment of the present invention, when determining the target detection model for performing feature recognition on the image feature of the detection image of the second portion, the server may first acquire an initial detection model obtained by training based on the first sample set, where the initial detection model is used to recognize the image feature of the target detection image of the first detection portion, further, the server may acquire a small number of sample detection images acquired when detecting the second detection portion, so that the model parameters of the initial detection model may be adjusted based on the image feature of the sample detection image acquired when detecting the second detection portion to obtain the target detection model for recognizing the image feature of the detection image of the second detection portion, thereby implementing a training process on the target detection model based on a small number of samples of the second detection portion, the pressure for acquiring the detection image sample of the second detection part is effectively reduced, so that the accuracy of the target detection model for identifying the image characteristics of the second detection part is improved, and the generalization of the target detection model can be effectively improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic diagram of a model training method for image detection according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of a model training method for image detection according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of model training provided by an embodiment of the present invention;

FIG. 4 is a schematic flow chart diagram of a model training method for image detection according to another embodiment of the present invention;

FIG. 5 is a schematic diagram of output features of a trained model according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of output features of a trained model according to another embodiment of the present invention;

FIG. 7 is a schematic block diagram of a model training apparatus for image detection according to an embodiment of the present invention;

fig. 8 is a schematic block diagram of a server according to an embodiment of the present invention.

Detailed Description

In one embodiment, in the model training method based on deep learning, in order to ensure the accuracy and reliability of a model (such as a target model) obtained by training, enough available sample detection images are required to perform model training, so that a target model with higher accuracy can be obtained based on deep learning. When the image samples of the characteristic detection part for training the target detection model are less, a model training method of transfer learning and joint loss can be adopted to realize the training of the target model. The method for transfer learning can reduce the dependence on the labeling information in the sample detection image, so that the transfer between the existing data models is used for realizing the sample based on less data quantity and finishing the training process of the target model.

In an embodiment, the target model may be a model for determining image features corresponding to a target detection image of a second detection portion, if the number of sample detection images corresponding to the second detection portion is small, model training may be performed using more sample detection images of a first detection portion to obtain an initial detection model for detecting the image features of the sample detection image of the first detection portion, and further, a model training method of migration learning and joint loss may be combined to determine the target detection model based on the initial detection model and the sample detection images corresponding to the second detection portion, so that when the number of sample detection images corresponding to the second detection portion is small, a training process of the target detection model for detecting the second detection portion is implemented, and training efficiency of the target detection model is improved. In one embodiment, when the image features of the sample detection images are identified, since the sample detection images of different detection parts have the same underlying image features, such as edge features, visual shape features, geometric change features, illumination change features and the like, the bottom-layer image features can be used for classification, target recognition and the like, so that an initial detection model trained on a sample detection image of a first detection object can be used as a universal image feature extractor based on the idea of transfer learning, and the extracted general image characteristics can be transferred to the identification field of the second detection part, thereby reducing the time for training the target model, the sample size requirement of the sample detection image of the second detection part enables the server to train a target detection model with higher accuracy based on a small number of sample detection images of the second detection part.

In an embodiment, referring to the schematic diagram of a model training method for image detection shown in fig. 1, as shown in fig. 1, a server may first pre-train a deep network model using other sample data sets in the same domain (e.g., the other sample data sets may be input into a learning system shown in fig. 1 for pre-training), and construct a deep learning model (i.e., an initial detection model) with initialization parameters, for example, if a target detection model expected by the server is a model (i.e., model feature knowledge) for determining image features corresponding to a target detection image of a second detection region, the server may construct an initial detection model with initialization parameters for performing feature recognition on the target detection image of a first detection region from the sample detection image of the first detection region. Furthermore, after the server determines the initial detection model, the server can perform fine tuning training on the initial parameters of the initial detection model by using the sample detection image of the second detection part, so that the transfer of image feature knowledge is realized, the full utilization of the image features of the sample detection image of the first detection part is realized, and meanwhile, the problem of too low detection accuracy of the trained target model caused by too small data quantity and data distribution difference of the sample detection image corresponding to the second detection part can be solved through transfer learning.

In one embodiment, after the server performs fine tuning training on the initialization parameters of the initial detection model by using the sample detection image of the second detection part, an intermediate detection model can be obtained, after the intermediate detection model is obtained, the server can adopt a preset target loss function to carry out further optimization training on the intermediate detection model, the optimized intermediate detection model is the target detection model, the image feature learning of the initial detection model trained by using the sample detection image of the first detection part can be made to have higher discrimination capability, the generalization of the target detection model obtained by learning can be improved, the accuracy of the target detection model in identifying the image characteristics of the target detection image of the second detection part is improved, the preset target loss function may be a joint loss function obtained by combining a cross entropy loss function and a central loss function.

Referring to fig. 2, a schematic flowchart of a model training method for image detection according to an embodiment of the present invention is shown in fig. 2, where the method may include:

s201, obtaining an initial detection model.

In one embodiment, the initial detection model is trained from a first set of samples, the first set of samples including: regarding the sample detection image of the first detection portion, correspondingly, the initial detection model is used for identifying the image characteristics of the target detection image detected and acquired by the detection device for the first detection portion, the detection device is distinguished according to the difference of the first detection portion, when the first detection portion is the larynx or the stomach, the corresponding detection device can be an endoscope or a capsule endoscope, and the acquisition of the sample detection image corresponding to the first detection portion is simpler than the acquisition process of the sample detection image corresponding to the second detection portion, so that the sample detection image in the first sample set with a larger number of samples can be used for the pre-training process of the deep learning network, and the initial detection model for the first detection portion can be obtained.

In one embodiment, after acquiring a certain number of sample detection images acquired during detection of a first detection portion, a server may pre-train a depth network model (i.e., a basic detection network) based on each sample detection image in the first sample set to construct a depth network model of initialization parameters, and specifically, may pre-train the basic detection network using each sample detection image with a large data volume in the first sample set, and when it is determined that training of the basic detection network is optimal, stop training of the basic detection network to obtain an initial detection model of initialization parameters, the initial detection model being used to detect image features of a target detection image of the first detection portion, wherein the server may, when detecting that a detection accuracy of the pre-trained model on the first detection portion meets a preset accuracy threshold, and determining that the training of the basic detection network is optimal. Further, in order to determine a target detection model for identifying image features of a target detection image of the second detection portion based on the initial detection model, the initial model parameters of the initial detection model may be adjusted by using the sample detection images in the second sample set based on a knowledge transfer learning algorithm of the model, so as to obtain a detection model for detecting the second portion.

S202, a second sample set is obtained, and the number of samples of the second sample set is smaller than that of the first sample set.

In one embodiment, the second set of samples comprises: regarding the sample test images of the second test portion, after determining the initial test model for identifying the image features of the target test image of the first test portion from each sample test image in the first sample set, the server may further obtain the sample test images collected during the test of the second test portion, so as to adjust the initial parameters of the initial test model based on the image features corresponding to the sample test images of the second test portion, wherein the number of samples in the second sample test image set is smaller than the number of samples in the second sample set, the number of samples in the second sample test image set is the number of images in the corresponding sample test image set, and the number of samples in the second sample set and the number of samples in the first sample set have obvious order difference, and the first sample set is the sample test image for performing the deep learning model training, the corresponding sample set number is ten million or millions of detection samples, and the second sample set performs parameter migration on the initialization parameter of the initial detection model obtained by training the first sample set only based on the image features of the sample detection images in the sample set, so that the sample number corresponding to the second sample set is ten million or several tens of thousands of detection image samples.

In one embodiment, the second detection region corresponding to each sample detection image in the second sample set is different from the first detection region corresponding to each sample detection image in the first sample set, the first detection region may be the stomach or the throat, and the second detection region may be a region such as the small intestine, which is not easily obtained corresponding to the detection image sample, or a detection region such as a region difficult to label the obtained detection image sample. The server may perform knowledge migration for recognition of the detection image of the second detection portion when adjusting the model parameter of the initial detection model based on the image feature corresponding to the sample detection image of the second detection portion in the second sample set, so as to adjust the initial detection model, and may obtain an intermediate detection model after performing knowledge migration for the initial detection model based on the sample detection image of the second detection portion, so that the intermediate detection model may be further trained based on the image feature of each sample detection image in the second sample set, so as to obtain a target detection model for detecting the second detection portion, that is, step S203 may be performed.

S203, according to the second sample set, adjusting model parameters of the initial detection model to obtain a target detection model, wherein the target detection model is used for identifying image characteristics of a target detection image acquired by the detection equipment for detecting the second detection part.

In an embodiment, the server performs a process of adjusting the model parameters of the initial detection model based on the image features corresponding to the detected images of each sample in the second sample set, that is, a process of retraining the model framework of the initial detection model based on the second sample set. As shown in fig. 3, if the intermediate detection model includes 3 full-connected layers, when the intermediate detection model is adjusted based on the detection image in the second sample set, the base network layer may be kept unchanged, and the 3 full-connected layers of the intermediate detection model may be removed and changed to one global average pooled connection layer and two full-connected layers, further, the number of neurons in the first full-connected layer may be set to 1024, the corresponding activation function may be set to a modified Linear Unit function (reduce), the number of neurons in the second full-connected layer may be set to an abnormal category number corresponding to the second detection location, and the activation function may be correspondingly set to a cross entropy loss function based on logistic regression (softmax). Further, a freezing method can be adopted, all parameters of the middle detection model except the top three layers are kept unchanged, the adjusted top three layers are further trained, specifically, a random gradient descent method can be adopted, the initial learning setting rate is set to be 0.1, when the iterative change of the error rate is lower than a threshold value, the learning rate is continuously descended by dividing by 10, the inertia momentum is 9, the weight value is attenuated to be 0.0001, and the maximum training iteration number is 1000 times, so that the training can be stopped when the training is optimal, and the middle detection model is obtained.

After obtaining the intermediate detection model, the server can further add a joint loss function into the intermediate detection model for training, firstly, the central loss function can be used as an optimized loss function in a first full connection layer to restrict output characteristics so as to increase inter-class difference, and softmax cross entropy is used as an optimized loss function in a second full connection layer to output a recognition result for recognizing the sample detection image of the second detection part. When the intermediate detection model is trained based on the second sample set, the second sample set can be divided into 3 parts of sample data, namely training set data, verification set data and test set data, and the corresponding ratio is 8:1: 1. Wherein the training set data is used for training the intermediate detection model; the verification set data is used for testing the corresponding current performance of the network model (or the detection model) in the training stage so as to supervise the training process of the network model and determine whether to terminate the training process of the model; the test set data is used for performing visual test evaluation on the final training result of the network model, and meanwhile, a series of data enhancement processing (rotating, moving, zooming, overturning and other modes) is performed on the training data so as to improve the generalization performance of the target network model obtained by the training set data.

In one embodiment, the function values corresponding to the softmax cross entropy loss function and the two loss functions of the central loss function can be balanced through the specific gravity λ of the central loss function, so that the recognition capability of the network on the image features can be enhanced, and a proper λ value is helpful for improving the accuracy of the network model, wherein when the central loss specific gravity λ of the network model is 0.005, the performance of the middle detection model is optimized by using the training set data. In the training process of the network model, an Adaptive matrix estimation (Adam) algorithm can be adopted as an algorithm for loss optimization, specifically, the initial learning rate can be set to 0.0001, in order to prevent overfitting, an Early termination mechanism (Early Stopping) can be introduced, and when the loss value of the verification set data is continuously iterated for 10 rounds and is not reduced any more, the training of the intermediate detection model can be stopped. Meanwhile, in order to store the optimal Model parameters, a checking mechanism (Model Checkpoint) of the network Model is introduced, after each round of iteration of the loss value is completed, whether the intermediate detection Model which stops training at present is used as the target network Model is determined by observing whether the precision of verification set data is improved, and finally, the stored target network Model structure and parameters are used for identifying the target detection image of the second detection part, for example, the image identification can be carried out on the target detection image corresponding to the intestinal focus of the capsule endoscope.

In the embodiment of the present invention, when determining the target detection model for performing feature recognition on the image features of the target detection image of the second location, the server may first acquire an initial detection model obtained by training based on the first sample set, where the initial detection model is used to recognize the features of the target detection image of the first location, further, the server may acquire a small number of sample detection images related to the second location, so that model parameters of the initial detection model may be adjusted based on the features of the sample detection images to obtain the target detection model for recognizing the image features of the target detection image of the second location, thereby implementing a training process on the target detection model based on the samples of the second location, effectively reducing the acquisition pressure on the detection image samples of the second location, therefore, the accuracy of the target detection model for identifying the image characteristics of the second detection part is improved, and the generalization of the target detection model can be effectively improved.

Referring to fig. 4, a schematic flow chart of a model training method for image detection according to another embodiment of the present invention is shown in fig. 4, where the method may include:

s401, obtaining an initial detection model, wherein the initial detection model is obtained by training according to a first sample set, and the first sample set comprises: regarding a sample detection image of a first detection part, the initial detection model is used for identifying the image characteristics of a target detection image acquired by detection of the first detection part by a detection device.

S402, obtaining a second sample set, where the number of samples in the second sample set is smaller than the number of samples in the first sample set, and the second sample set includes: an image is detected with respect to the specimen of the second detection site.

In an embodiment, for specific implementation of step S401 and step S402, refer to the specific implementation of step S201 and step S202 in the foregoing embodiment, and details are not described herein again.

And S403, inputting the detection images of the samples in the second sample set into the initial detection model, and training the initial detection model.

In an embodiment, when obtaining the second sample set and training the initial detection model based on each sample detection image in the second sample set, the server may first input each sample detection image in the second sample set into the initial detection model to train the initial detection model, and specifically, the server may first obtain an initial model structure of the initial detection model, where the initial model structure includes one or more of the following: the number of network layers of the initial detection model, the corresponding activation function of each layer of the network, and the number of neurons of each layer of the network; further, the server can adjust the number of network layers of the initial detection model, the activation function corresponding to each layer of the network, or the number of neurons corresponding to each layer of the network to obtain an intermediate model structure corresponding to the intermediate detection model; each sample detection image in the second sample set may thus be input into the intermediate model structure to determine model parameters of the intermediate model structure. The model structure of the initial detection model may be, for example, the above-mentioned model structure including three fully-connected layers, and the intermediate model structure may be, for example, a structure in which the three fully-connected layers are changed into one global average pooling layer and two fully-connected layers.

In one embodiment, after the server determines an intermediate model structure and model parameters corresponding to the intermediate model structure, in order to optimize the intermediate model structure, further, optimization may be used to optimize the model parameters of the intermediate model structure; and determining that the training of the initial detection model is optimal when the training of the server on the model parameters of the intermediate model structure is optimal, wherein the optimization algorithm comprises a random gradient descent algorithm, an Adaptive motion Estimation (Adam) optimization algorithm, a Response Surface (RSM) optimization algorithm, a Return-Oriented (ROP) optimization algorithm and the like. Further, after the server determines that the training of the initial detection model is optimal, the server may switch to step S404 and step S405 to obtain the target detection model.

S404, when the training of the initial detection model reaches the optimum, taking the optimum initial detection model after training as an intermediate detection model.

S405, training the intermediate detection model, and when the training of the intermediate detection model reaches the optimal state, taking the trained optimal intermediate detection model as a target detection model.

In step S404 and step S405, the intermediate model structure corresponding to the intermediate detection model may include: the model structure of one or more full-connected layers, the second sample set is divided into: training a sample set, validating the sample set, and testing the sample set. After determining the intermediate detection model, the server can train the intermediate detection model, and specifically, the server can determine a target loss function for constraining the full connection layer; therefore, each sample detection image included in the training sample set in the second sample set can be input into the intermediate detection model based on the constraint of the target loss function corresponding to each fully connected layer, so as to train the intermediate detection model.

In one embodiment, the target loss function determined by the server to constrain the fully-connected layer is a joint loss function determined based on the central loss function and the softmax loss function, wherein in forward propagation of deep model training, a loss function value can be obtained by comparing a predicted result of the last layer of the network model with a real result, so that updating of all weight values of the network model can be realized by calculating an error value between the predicted result and the real result and performing backward propagation. In one embodiment, a softmax cross-entropy loss function may be used as an optimization loss function for the network model training process, and the softmax cross-entropy loss function is defined as follows:

wherein m and n respectively represent the number and the class number of samples input into the network model for training, and x_iIs the ith feature, W_jIs the jth column of weights in the fully connected layer, and b is the offset. The network model trained by adopting the softmax as the loss function has better separability, but the characteristics of compactness in the class and dispersion among the classes cannot be obtained through effective learning of the network model, and as can be seen from a gradient updating formula of the softmax cross entropy loss function in the formula 1.2, when the parameters of the function are updated, the network model only considers the current j classes and punishs other classes, so that the characteristics have separability, but the distance in the classes is not restricted, and the final result is that although the learned characteristics are separable, the distances in the classes are overlarge, and the distinguishing capability of the model characteristics is poor.

Where m is the number of samples, x⁽ⁱ⁾Is an input feature; y is⁽ⁱ⁾Is of the i-th class, p (y)⁽ⁱ⁾＝j|x⁽ⁱ⁾(ii) a θ) is x⁽ⁱ⁾The probability of being classified as j, θ is the parameters to be trained, i.e., weight and bias.

In order to solve the problem that the network model has poor feature recognition capability for image recognition, a central loss function may be introduced, wherein the central loss function is defined as:

wherein, c_yiThen the center, x, of the class to which the ith sample belongs is represented_iThe feature value of the ith sample (i.e., the detected image corresponding to the detected portion) is represented.

In one embodiment, the basic idea of the center loss function is similar to clustering, by selecting a class center for each class of features in a batch, and calculating the sum of squares of the distances of the features of each sample from the feature center, the smaller this distance is the better, i.e., the smaller the intra-class distance is the better. Therefore, the invention provides a combined loss function method combining the softmax cross entropy loss function and the center loss function as a supervision signal for training the depth model, and punishment is carried out on the features far away from the class center in the feature layer, so that the inter-class distance is increased, the intra-class distance is reduced, the learned features have better compactness, the defect that only the softmax cross entropy loss function is used is made up, and the error recognition rate is reduced. The joint loss based on the cross entropy loss function and the central loss function of softmax is as follows:

where λ is used to balance the specific gravities of the two losses and λ ∈ (0, 0.1). In order to visually display the effect of joint loss, the detection image of the second detection part of the common image features can be used, for example, 13 common images of the small intestine foci can be used for image feature extraction, as shown in fig. 5, it can be seen from the image in fig. 5 that the feature class and the class of the model output are compact when only the softmax cross entropy loss function is used, and as shown in fig. 6, it can be seen from the image in fig. 6 that the feature class and the class of the model output are compact and discrete when the joint loss is used.

In one embodiment, when the intermediate detection model includes a first fully-connected layer and a second fully-connected layer, the server may determine a first loss function for constraining the first fully-connected layer and a second loss function for constraining the second fully-connected layer when determining a target loss function for constraining the fully-connected layer; so that a joint loss function resulting from the combination of the first and second loss functions can be used as the target loss function.

In the embodiment of the invention, after acquiring the initial detection model trained based on the first sample set and the second sample set, the server can input the sample detection images in the second sample set into the initial detection model to train the initial detection model, when the training of the initial detection model reaches the optimum, the trained optimum initial detection model can be used as the intermediate detection model, and the intermediate detection model is trained, when the intermediate detection model reaches the optimum, the trained optimum intermediate detection model is used as the target detection model, so that the process of training the initial detection model based on the initial detection model and a small number of second sample sets to obtain the target detection model is realized, and the accuracy of the feature discrimination of the target detection model to the detection part can be improved by optimizing the training process of the target detection model based on the joint loss function constructed by the central loss function, therefore, the accuracy of the target detection model in image feature recognition is improved.

Based on the description of the above embodiment of the model training method for image detection, an embodiment of the present invention further provides a model training apparatus, which may be a computer program (including program code) running in the server. The model training apparatus for image detection may be used to perform the model training method for image detection as shown in fig. 2 and fig. 4, and referring to fig. 7, the model training apparatus for image detection may include: an acquisition unit 701 and an adjustment unit 702.

An obtaining unit 701, configured to obtain an initial detection model, where the initial detection model is obtained by training according to a first sample set, and the first sample set includes: regarding a sample detection image of a first detection part, the initial detection model is used for identifying the image characteristics of a target detection image acquired by detection equipment for detecting the first detection part;

the obtaining unit 701 is further configured to obtain a second sample set, where a number of samples of the second sample set is smaller than a number of samples of the first sample set, and the second sample set includes: a sample detection image for the second detection site;

an adjusting unit 702, configured to adjust a model parameter of the initial detection model according to the second sample set, to obtain a target detection model, where the target detection model is used to identify an image feature of a target detection image acquired by the detection device for detecting the second detection location.

In an embodiment, the obtaining unit 701 is specifically configured to:

acquiring a first sample set, and inputting each sample detection image in the first sample set into a basic detection network so as to train the basic detection network;

and when the training of the basic detection network reaches the optimum, taking the trained optimum basic detection network as an initial detection model.

In an embodiment, the adjusting unit 702 is specifically configured to:

inputting each sample detection image in the second sample set into the initial detection model, and training the initial detection model;

when the training of the initial detection model reaches the optimum, taking the trained optimum initial detection model as an intermediate detection model;

and training the intermediate detection model, and taking the trained optimal intermediate detection model as a target detection model when the training of the intermediate detection model is optimal.

In an embodiment, the adjusting unit 702 is specifically configured to:

obtaining an initial model structure of the initial detection model, the initial model structure including one or more of: the number of network layers of the initial detection model, the corresponding activation function of each layer of the network, and the number of neurons of each layer of the network;

adjusting the number of network layers of the initial detection model, the corresponding activation function of each layer of the network, or the number of neurons corresponding to each layer of the network to obtain an intermediate model structure corresponding to an intermediate detection model;

and inputting each sample detection image in the second sample set into the intermediate model structure to determine the model parameters of the intermediate model structure.

In one embodiment, the apparatus further comprises: an optimization unit 703 and a determination unit 704.

An optimizing unit 703 configured to optimize the model parameters of the intermediate model structure by using an optimization algorithm;

a determining unit 704, configured to determine that training of the initial detection model is optimal when the model parameters of the intermediate model structure are optimal.

In one embodiment, the intermediate model structure corresponding to the intermediate detection model includes: one or more fully connected layers; the second sample set is divided into: training a sample set;

the adjusting unit 702 is specifically configured to:

determining a target loss function for constraining the full link layer;

and inputting each sample detection image included in the training sample set in the second sample set into the intermediate detection model based on the constraint of the target loss function corresponding to each full connection layer so as to train the intermediate detection model.

In one embodiment, if the intermediate detection model includes a first fully-connected layer and a second fully-connected layer; the target loss function is obtained by combining a first loss function and a second loss function, wherein the first loss function is used for constraining the first full-link layer, and the second loss function is used for constraining the second full-link layer.

In the embodiment of the present invention, when determining a target detection model for performing feature recognition on image features of a target detection image of a second location, the obtaining unit 701 may first obtain an initial detection model trained based on a first sample set, where the initial detection model is used for recognizing features of the target detection image of the first location, further, the obtaining unit 701 may obtain a small number of sample detection images related to the second location, so that the adjusting unit 702 may adjust model parameters of the initial detection model based on features of the sample detection images to obtain the target detection model for recognizing image features of the target detection image of the second location, thereby implementing a training process on the target detection model based on samples of the small number of second locations, effectively reducing the obtaining pressure on the detection image samples of the second location, therefore, the training efficiency of the target detection model for recognizing the image characteristics of the second detection part is improved, and the generalization of the target detection model can be effectively improved.

Fig. 8 is a schematic block diagram of a server according to an embodiment of the present invention. The server in the present embodiment as shown in fig. 8 may include: one or more processors 801; one or more input devices 802, one or more output devices 803, and memory 804. The processor 801, the input device 802, the output device 803, and the memory 804 described above are connected by a bus 805. The memory 804 is used for storing a computer program comprising program instructions, and the processor 801 is used for executing the program instructions stored by the memory 804.

The memory 804 may include a volatile memory (volatile memory), such as a random-access memory (RAM); the memory 804 may also include a non-volatile memory (non-volatile memory), such as a flash memory (flash memory), a solid-state drive (SSD), etc.; the memory 804 may also comprise a combination of the above-described types of memory.

The processor 801 may be a Central Processing Unit (CPU). The processor 801 may further include a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), or the like. The PLD may be a field-programmable gate array (FPGA), a General Array Logic (GAL), or the like. The processor 801 may also be a combination of the above structures.

In the embodiment of the present invention, the memory 804 is used for storing a computer program, the computer program includes program instructions, and the processor 801 is used for executing the program instructions stored in the memory 804, so as to implement the steps of the corresponding methods as described above in fig. 2 and fig. 4.

In one embodiment, the processor 801 is configured to call the program instructions for performing:

optimizing the model parameters of the intermediate model structure by adopting an optimization algorithm;

and when the model parameters of the intermediate model structure reach the optimal values, determining that the training of the initial detection model reaches the optimal values.

the processor 801 is configured to call the program instructions for performing:

determining a target loss function for constraining the full link layer;

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

While the invention has been described with reference to a particular embodiment, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A model training method for image detection, comprising:

2. The method of claim 1, wherein the obtaining an initial detection model comprises:

3. The method of claim 1, wherein adjusting model parameters of the initial detection model according to the second sample set to obtain a target detection model comprises:

4. The method of claim 3, wherein inputting each sample detection image in the second sample set into the initial detection model and training the initial detection model comprises:

5. The method of claim 4, further comprising:

6. The method of claim 3, wherein the intermediate model structure corresponding to the intermediate detection model comprises: one or more fully connected layers; the second sample set is divided into: training a sample set;

the training the intermediate detection model comprises:

determining a target loss function for constraining the full link layer;

7. The method of claim 6, wherein if the intermediate detection model comprises a first fully-connected layer and a second fully-connected layer; the target loss function is obtained by combining a first loss function and a second loss function, wherein the first loss function is used for constraining the first full-link layer, and the second loss function is used for constraining the second full-link layer.

8. A model training apparatus for image detection, comprising:

9. A server, comprising a processor, a memory, and a communication interface, the processor, the memory, and the communication interface being interconnected, wherein the memory is configured to store computer program instructions and the processor is configured to execute the program instructions to implement the method of any one of claims 1-7.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions that, when executed by a processor, cause the processor to carry out the method according to any one of claims 1-7.