WO2023185516A1 - 图像识别模型的训练方法、识别方法、装置、介质和设备 - Google Patents

图像识别模型的训练方法、识别方法、装置、介质和设备 Download PDF

Info

Publication number
WO2023185516A1
WO2023185516A1 PCT/CN2023/082355 CN2023082355W WO2023185516A1 WO 2023185516 A1 WO2023185516 A1 WO 2023185516A1 CN 2023082355 W CN2023082355 W CN 2023082355W WO 2023185516 A1 WO2023185516 A1 WO 2023185516A1
Authority
WO
WIPO (PCT)
Prior art keywords
training
image
statistic
training sample
sample set
Prior art date
Application number
PCT/CN2023/082355
Other languages
English (en)
French (fr)
Inventor
边成
李永会
杨延展
Original Assignee
北京字节跳动网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字节跳动网络技术有限公司 filed Critical 北京字节跳动网络技术有限公司
Publication of WO2023185516A1 publication Critical patent/WO2023185516A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Definitions

  • the present disclosure relates to the field of image processing technology, and specifically, to an image recognition model training method and device, recognition method and device, media, equipment, computer program products, and computer programs.
  • Colorectal cancer is one of the malignant tumors with the highest incidence in my country, but early diagnosis and appropriate treatment of cancer can bring about a cure rate of about 90%.
  • Regular colonoscopy screening can identify adenomatous polyps and prevent cancer. During endoscopy, it is crucial to identify the ileocecal region in endoscopic images.
  • endoscopic image recognition is mainly based on deep neural networks (for example, convolutional neural networks).
  • the training data may come from the same medical center or from different medical centers.
  • methods in the related art ignore the problem of model generalization on new centers, and do not pay attention to the additional knowledge in the training data of multiple centers. This will result in the need to collect data from the new center each time the model is deployed to a new center to fine-tune the trained model to ensure the generalization performance of the model. Otherwise, the accuracy of the model's recognition of endoscopic images will be affected.
  • the process of fine-tuning the trained model every time the model is deployed is complicated and may cause overfitting and other problems, affecting the recognition accuracy of the model.
  • the present disclosure provides a method for training an image recognition model, the method including:
  • the training sample set includes training images and training recognition results corresponding to the training images, and the data distribution of each training sample set is not completely consistent;
  • For each training image determine the gradient of the training image according to the training image and the training recognition result corresponding to the training image;
  • a first statistic of each training sample set and a second statistic of each training sample set are determined; the first statistic is used to characterize the training sample The mean vector corresponding to the set, the second statistic is used to characterize the covariance matrix corresponding to the training sample set;
  • the preset model is updated to obtain an image recognition model.
  • an image recognition method which method includes:
  • the image to be recognized is input into a pre-trained image recognition model to obtain the recognition result of the image to be recognized; wherein the image recognition model is trained by the image recognition model training method described in the first aspect.
  • the present disclosure provides a training device for an image recognition model.
  • the training device for an image recognition model includes:
  • the first acquisition module is used to acquire multiple training sample sets;
  • the training sample set includes training images and training recognition results corresponding to the training images, and the data distribution of each training sample set is not completely consistent;
  • a determination module configured to, for each training image, determine the gradient of the training image according to the training image and the training recognition result corresponding to the training image;
  • the determination module is further configured to determine the first statistic of each training sample set and the second statistic of each training sample set according to the gradient of each training image; the first statistic The quantity is used to characterize the mean vector corresponding to the training sample set, and the second statistic is used to characterize the covariance matrix corresponding to the training sample set;
  • the determination module is also configured to determine a statistic loss function based on the first statistic and the second statistic;
  • An update module is used to update the preset model according to the statistical loss function to obtain an image recognition model.
  • an image recognition device which includes:
  • the second acquisition module is used to acquire the image to be recognized
  • a processing module configured to input the image to be recognized into a pre-trained image recognition model to obtain the recognition result of the image to be recognized; wherein the image recognition model is trained by the image recognition model described in the third aspect Obtained by device training.
  • the present disclosure provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processing device, the steps of the method described in the first or second aspect of the present disclosure are implemented.
  • an electronic device including:
  • a processing device configured to execute the computer program in the storage device to implement the steps of the method described in the first or second aspect of the present disclosure.
  • the present disclosure provides a computer program product, including a computer program that implements the steps of the method described in the first or second aspect of the present disclosure when executed by a processing device.
  • the present disclosure provides a computer program that, when executed by a processing device, implements the steps of the method described in the first or second aspect of the disclosure.
  • Figure 1 is a flow chart of a training method for an image recognition model according to an exemplary embodiment
  • Figure 2 is a flow chart of step 102 according to the embodiment shown in Figure 1;
  • Figure 3 is a flow chart of step 103 according to the embodiment shown in Figure 1;
  • Figure 4 is a flow chart of step 104 according to the embodiment shown in Figure 1;
  • Figure 5 is a flow chart of an image recognition method according to an exemplary embodiment
  • Figure 6 is a block diagram of a training device for an image recognition model according to an exemplary embodiment
  • Figure 7 is a block diagram of a determination module according to the embodiment shown in Figure 6;
  • Figure 8 is a block diagram of an image recognition device according to an exemplary embodiment
  • FIG. 9 is a block diagram of an electronic device according to an exemplary embodiment.
  • the term “include” and its variations are open-ended, ie, “including but not limited to.”
  • the term “based on” means “based at least in part on.”
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one additional embodiment”; and the term “some embodiments” means “at least some embodiments”. Relevant definitions of other terms will be given in the description below.
  • Figure 1 is a flow chart of a training method for an image recognition model according to an exemplary embodiment. As shown in Figure 1, the method may include the following steps:
  • Step 101 Obtain multiple training sample sets.
  • the training sample set includes training images and training recognition results corresponding to the training images, and the data distribution of each training sample set is not completely consistent.
  • neural network model learning looks for shortcuts in the optimization process, tending to rely on simple features in the training data.
  • the neural network model will prioritize simple bias information in the training data during training. For example, in the scenario of identifying the ileocecal part in endoscopic images, the neural network model will give priority to remembering information with simple features such as the machine model of the image acquisition device or the body position at the time of shooting during training.
  • an image recognition model can be trained using training data from multiple data centers, so that the image recognition model can learn image features with center invariance and capture discriminative features related to image recognition. information while reducing sensitivity to the data distribution of a specific data center, thereby ensuring the generalization performance of the image recognition model on new data centers.
  • the training sample set may include training images and training recognition results corresponding to the training images.
  • the data center can be a medical center
  • the training images can be endoscopic images collected during endoscopy by the image acquisition equipment of the medical center during a historical period
  • the training recognition The result may be a classification result manually annotated on the endoscopic image (for example, the classification result may include two types: the endoscopic image is an ileocecal image and the endoscopic image is not an ileocecal image), and is collected by different medical centers at the same time.
  • the data distribution of the obtained training sample set is not completely consistent. That is, the expression "the data distribution of each training sample set is not completely consistent" in this disclosure can be It refers to the incompletely consistent data distribution between different training sample sets. In other words, for any two training sample sets, the data distribution of one training sample set is not completely consistent with the data distribution of the other training sample set.
  • Step 102 For each training image, determine the gradient of the training image based on the training image and the training recognition result corresponding to the training image.
  • Step 103 Determine the first statistic of each training sample set and the second statistic of each training sample set based on the gradient of each training image. Among them, the first statistic is used to characterize the mean vector corresponding to the training sample set, and the second statistic is used to characterize the covariance matrix corresponding to the training sample set.
  • a preset model for image recognition can be built in advance, and after obtaining multiple training sample sets, each training image in all training sample sets can be input into the preset model to obtain each training image. Predicted recognition results for training images. The gradient of each training image can then be calculated based on the predicted recognition results of each training image and the training recognition results of the training image. The gradient of each training image can be understood as the deep feature of the training image obtained by taking into account the image and the training recognition results.
  • the first statistic used to characterize the mean vector corresponding to the training sample set can be calculated according to the gradient of the training image, and based on the gradient of the training image, the first statistic used to characterize the mean vector of the training sample set can be calculated
  • the second statistic of the covariance matrix usually includes diversity shift and correlation shift.
  • Diversity shift refers to the fact that the data during model training and testing come from different data centers and therefore have different characteristics (for example, the image acquisition equipment used by two medical centers is different, resulting in differences in the resolution and color appearance of endoscopic imaging).
  • Correlation shift means that the correlation information between the data on the test set is different from the correlation information between the data on the training sample set.
  • the first statistic is actually used to measure the diversity shift corresponding to the training sample set, while the second statistic is used to measure the correlation shift corresponding to the training sample set.
  • Step 104 Determine a statistic loss function based on the first statistic and the second statistic.
  • Step 105 Update the preset model according to the statistical loss function to obtain the image recognition model.
  • each two statistic can be determined based on the first statistic of each two training sample sets and the second statistic of the two training sample sets.
  • the statistic loss function may include a first statistic loss function and a second statistic loss function.
  • the first statistic loss function is used to characterize the difference between the first statistics of the two training sample sets, and the second statistic loss function
  • the quantitative loss function is used to characterize the difference between the second statistics of the two training sample sets. Then, the three loss functions of the first statistic loss function, the second statistic loss function, and the initial loss function of the preset model corresponding to each two training sample sets can be minimized simultaneously to update the preset model. model parameters to obtain the image recognition model.
  • the first statistic can be a first-order statistic
  • the second statistic can be a second-order statistic.
  • the first-order statistic and the second-order statistic can summarize most characteristics of the data distribution. Therefore, the present disclosure uses first-order statistics and second-order statistics on the gradient space, which can explicitly measure the gradient distribution distance between two data centers and minimize the gradient distribution difference of data from different data centers, making different The gradient distribution of the data center is as close as possible to eliminate the dependence on the data distribution of different data centers, thereby forcing the model to learn from the data of multiple data centers during training and capture cross-center invariant discriminative information (i.e. having the center Invariant image features) to improve the model's generalization ability on new data centers.
  • cross-center invariant discriminative information i.e. having the center Invariant image features
  • the present disclosure first obtains multiple training sample sets including training images and training recognition results, and then for each training image, determines the gradient of the training image based on the training image and the training recognition results corresponding to the training image. , and determine the first statistic of each training sample set and the second statistic of each training sample set based on the gradient of each training image, and finally determine the statistical loss function based on the first statistic and the second statistic. , and update the preset model according to the statistical loss function to obtain the image recognition model.
  • the present disclosure can determine the statistical loss function based on the first statistic and the second statistic, and utilize Update the preset model with a statistical loss function so that the preset model can learn image features with center invariance using training images from multiple training sample sets and capture discriminative information related to image recognition, ignoring specific training samples. Set of noises, thereby obtaining an image recognition model with high generalization performance, which can ensure the accuracy of recognizing the image to be recognized, and does not require additional fine-tuning of the image recognition model, which can avoid over-fitting problems and improve the image recognition model. recognition accuracy.
  • FIG. 2 is a flow chart of step 102 according to the embodiment shown in FIG. 1 .
  • the preset model may include a feature extraction network and a classifier, and step 102 may include the following steps:
  • Step 1021 Preprocess the training image to obtain a preprocessed training image.
  • each training image can also be preprocessed in advance.
  • random data enhancement can be performed on the training image to obtain a preprocessed training image.
  • the random data enhancement may include at least one of random scaling, random cropping, random flipping (including random horizontal/vertical flipping), and random color dithering (including brightness, contrast, etc.).
  • Step 1022 Input the preprocessed training image into the feature extraction network to obtain image features of the training image.
  • Step 1023 Input the image features of the training image into the classifier to obtain the predicted recognition result of the training image.
  • the preset model may include a feature extraction network f ⁇ and a classifier W.
  • the preprocessed training images corresponding to each training image can be input into the feature extraction network f ⁇ to obtain the image features of each training image, and the feature extraction network f ⁇ can be extracted
  • the image features of each training image can be input to the classifier W to obtain the predicted recognition results of each training image.
  • the fully connected layer with softmax activation can be used as the classifier W.
  • the predicted recognition result of the image, that is, the classification probability predicted by the classifier W can be expressed as Among them, C is the number of categories, K is the feature dimension, and ⁇ is the softmax operation.
  • Step 1024 Determine the gradient of the training image based on the predicted recognition results, training recognition results and image features of the training image.
  • the gradient of each training image can be determined based on the predicted recognition results, training recognition results and image features of each training image.
  • the gradient of the i-th training image x (i) from the training sample set in the data center it can be understood as the classification loss on the classifier when x (i) and its corresponding training recognition result y (i) are used as input.
  • the gradient of the parameter w of W This gradient is the gradient used when optimizing network parameters (gradient descent).
  • the gradient for the i-th training image from data center e can be expressed as:
  • FIG. 3 is a flow chart of step 103 according to the embodiment shown in FIG. 1 .
  • step 103 may include the following steps:
  • Step 1031 Determine the first statistic of each training sample set based on the gradients of all training images included in each training sample set.
  • Step 1032 Determine the second statistic of the training sample set based on the gradients of all training images included in each training sample set and the first statistic of the training sample set.
  • each gradient can be pulled into a vector, then the matrix
  • the first statistic of the training sample set can be expressed as: That is, the first statistic of the training sample set is expressed as a vector with length KC.
  • the second statistic of the training sample set can be expressed as: That is, the second statistic of the training sample set is expressed as a matrix of size KC ⁇ KC.
  • FIG. 4 is a flow chart of step 104 according to the embodiment shown in FIG. 1 .
  • the statistical loss function includes a first statistical loss function and a second statistical loss function.
  • Step 104 may include the following steps:
  • Step 1041 Based on the first statistics of each two training sample sets, determine the first statistic loss function corresponding to the two training sample sets.
  • Step 1042 Based on the second statistics of each two training sample sets, determine the second statistical loss function corresponding to the two training sample sets.
  • the first statistic loss function corresponding to every two training sample sets and the second statistic corresponding to every two training sample sets can be further determined.
  • Quantitative loss function For example, for training sample sets from two different data centers e and f, the corresponding first statistic loss function L 1st can be expressed as:
  • step 105 can be implemented in the following ways:
  • the first statistic loss function, the second statistic loss function corresponding to each two training sample sets, and the initial loss function of the preset model are minimized to obtain an image recognition model.
  • each training image in the two training sample sets can be used as the input of the preset model, and the training recognition result corresponding to each training image can be used as the output of the preset model. to train the preset model.
  • the three loss functions of the first statistic loss function, the second statistic loss function corresponding to each two training sample sets, and the initial loss function of the preset model are performed simultaneously. Minimize processing to update the model parameters of the preset model to obtain the image recognition model.
  • the initial loss function can be the classification loss function of the classifier.
  • the training image recognition model can be implemented by PyTorch, and the training parameters can be set as: 1) Learning rate: 5e-5, 2) Batch size: 256, Optimizer: AdamW, Epoch: 100 for the first training cycle, Second training period 20, input image size: 448x448.
  • the present disclosure first obtains multiple training sample sets including training images and training recognition results, and then for each training image, determines the gradient of the training image based on the training image and the training recognition results corresponding to the training image. , and determine the first statistic of each training sample set and the second statistic of each training sample set based on the gradient of each training image, and finally the root According to the first statistic and the second statistic, a statistic loss function is determined, and based on the statistic loss function, the preset model is updated to obtain an image recognition model.
  • the present disclosure can determine a statistic loss function based on the first statistic and the second statistic, and update the preset model using the statistic loss function, so that the preset model can learn to have central invariance using training images of multiple training sample sets.
  • Figure 5 is a flow chart of an image recognition method according to an exemplary embodiment. As shown in Figure 5, the method may include the following steps:
  • Step 201 Obtain the image to be recognized.
  • Step 202 Input the image to be recognized into a pre-trained image recognition model to obtain the recognition result of the image to be recognized.
  • the image recognition model is trained by the image recognition model training method shown in any of the above embodiments.
  • the trained image recognition model can be deployed to a designated data center for use. Then, the image to be recognized collected by the designated data center can be obtained, and the image to be recognized can be input into the trained image recognition model to obtain the recognition result of the image to be recognized output by the image recognition model.
  • the image recognition model used to identify the ileocecal part in endoscopic images as an example, when the image to be recognized is an endoscopic image, the endoscopic image can be input into the image recognition model to obtain a method for indicating the endoscope. Whether the image is the recognition result of the ileocecal part.
  • the image recognition model in this disclosure is not limited to identifying the ileocecal part in endoscopic images, but can also be applied to any image recognition scenario (such as identifying people, objects, etc. in the image). This disclosure is No specific restrictions are made.
  • the present disclosure first obtains the image to be recognized, inputs the image to be recognized into a pre-trained image recognition model, and obtains the recognition result of the image to be recognized.
  • the present disclosure can ensure the accuracy of identifying the image to be recognized by using a pre-trained image recognition model with high generalization performance and high recognition accuracy to perform image recognition.
  • FIG. 6 is a block diagram of a training device for an image recognition model according to an exemplary embodiment.
  • the image recognition model training device 300 includes:
  • the first acquisition module 301 is used to acquire multiple training sample sets.
  • the training sample set includes training images and training recognition results corresponding to the training images, and the data distribution of each training sample set is not completely consistent.
  • the determination module 302 is configured to determine, for each training image, the gradient of the training image according to the training image and the training recognition result corresponding to the training image.
  • the determination module 302 is also configured to determine the first statistic of each training sample set and the second statistic of each training sample set according to the gradient of each training image. Among them, the first statistic is used to characterize the mean vector corresponding to the training sample set, and the second statistic is used to characterize the covariance matrix corresponding to the training sample set.
  • the determination module 302 is also used to determine the statistic loss function based on the first statistic and the second statistic.
  • the update module 303 is used to update the preset model according to the statistical loss function to obtain an image recognition model.
  • FIG. 7 is a block diagram of a determination module according to the embodiment shown in FIG. 6 .
  • the preset model includes a feature extraction network and a classifier.
  • the determination module 302 includes:
  • the processing sub-module 3021 is used to preprocess the training image to obtain the preprocessed training image.
  • the extraction submodule 3022 is used to input the preprocessed training image into the feature extraction network to obtain the image features of the training image.
  • the classification submodule 3023 is used to input the image features of the training image into the classifier to obtain the predicted recognition result of the training image.
  • Gradient determination sub-module 3024 is used to determine the predicted recognition results, training recognition results and image features of the training image. Determine the gradient of the training image.
  • processing sub-module 3021 is used for:
  • the random data enhancement includes at least one of random scaling, random cropping, random flipping, and random color dithering.
  • the determining module 302 is used for:
  • the first statistic of each training sample set is determined based on the gradients of all training images included in each training sample set.
  • the second statistic of the training sample set is determined based on the gradients of all training images included in each training sample set and the first statistic of the training sample set.
  • the statistic loss function includes a first statistic loss function and a second statistic loss function.
  • the determination module 302 is used for:
  • the first statistical loss function corresponding to the two training sample sets is determined.
  • the second statistical loss function corresponding to the two training sample sets is determined.
  • the processing module 302 is configured to minimize the first statistical loss function, the second statistical loss function corresponding to each two training sample sets, and the initial loss function of the preset model to obtain an image recognition model.
  • FIG. 8 is a block diagram of an image recognition device according to an exemplary embodiment. As shown in Figure 8, the image recognition device 400 includes:
  • the second acquisition module 401 is used to acquire the image to be recognized.
  • the processing module 402 is used to input the image to be recognized into a pre-trained image recognition model to obtain the recognition result of the image to be recognized.
  • the image recognition model is trained by the above image recognition model training device 300 .
  • the processing module 402 is configured to input the endoscopic image to the image recognition model when the image to be recognized is an endoscopic image, and obtain a recognition result indicating whether the endoscopic image is an ileocecal part. .
  • the present disclosure first obtains the image to be recognized, inputs the image to be recognized into the image recognition model, and obtains the recognition result of the image to be recognized, wherein the image recognition model is trained in the following manner: acquiring multiple training images including and the training sample set of training recognition results, and then for each training image, determine the gradient of the training image based on the training image and the training recognition results corresponding to the training image, and determine each training image based on the gradient of each training image. The first statistic of the sample set and the second statistic of each training sample set. Finally, the statistic loss function is determined based on the first statistic and the second statistic, and the preset model is updated based on the statistic loss function. , get the image recognition model.
  • the present disclosure can determine a statistic loss function based on the first statistic and the second statistic, and update the preset model using the statistic loss function, so that the preset model can learn to have central invariance using training images of multiple training sample sets.
  • the image recognition model undergoes additional fine-tuning to avoid overfitting problems and improve the recognition accuracy of the image recognition model.
  • FIG. 9 shows a schematic structural diagram of an electronic device 600 suitable for implementing embodiments of the present disclosure (which may be, for example, the execution subject in the above embodiments, and may be a terminal device or a server).
  • Terminal devices in embodiments of the present disclosure may include, but are not limited to, mobile phones, laptops, digital broadcast receivers, PDAs (Personal Digital Assistants), PADs (Tablets), PMPs (Portable Multimedia Players), vehicle-mounted terminals (such as Mobile terminals such as car navigation terminals) and fixed terminals such as digital TVs, desktop computers, etc.
  • the electronic device shown in FIG. 9 is only an example and should not impose any limitations on the functions and scope of use of the embodiments of the present disclosure.
  • the electronic device 600 may include a processing device (eg, central processing unit, graphics processor, etc.) 601 , which may be loaded into a random access device according to a program stored in a read-only memory (ROM) 602 or from a storage device 608 .
  • ROM read-only memory
  • RAM RAM
  • the program in 603 performs various appropriate actions and processing.
  • various programs and data required for the operation of the electronic device 600 are also stored.
  • the processing device 601, ROM 602 and RAM 603 are connected to each other via a bus 604.
  • An input/output (I/O) interface 605 is also connected to bus 604.
  • I/O interface 605 input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speakers, vibration
  • An output device 607 such as a computer
  • a storage device 608 including a magnetic tape, a hard disk, etc.
  • Communication device 609 may allow electronic device 600 to communicate wirelessly or wiredly with other devices to exchange data.
  • FIG. 9 illustrates electronic device 600 with various means, it should be understood that implementation or availability of all illustrated means is not required. More or fewer means may alternatively be implemented or provided.
  • embodiments of the present disclosure include a computer program product including a computer program carried on a non-transitory computer-readable medium, the computer program containing program code for performing the method illustrated in the flowchart.
  • the computer program may be downloaded and installed from the network via communication device 609, or from storage device 608, or from ROM 602.
  • the processing device 601 When the computer program is executed by the processing device 601, the above functions defined in the method of the embodiment of the present disclosure are performed.
  • the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
  • the computer-readable storage medium may be, for example, but is not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or any combination thereof. More specific examples of computer readable storage media may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard drive, random access memory (RAM), read only memory (ROM), removable Programmd read-only memory (EPROM or flash memory), fiber optics, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program for use by or in connection with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium that can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device .
  • Program code embodied on a computer-readable medium may be transmitted using any suitable medium, including but not limited to: wire, optical cable, RF (radio frequency), etc., or any suitable combination of the above.
  • the client and server can communicate using any currently known or future developed network protocol such as HTTP (HyperText Transfer Protocol), and can communicate with digital data in any form or medium.
  • Communications e.g., communications network
  • communications networks include local area networks (“LAN”), wide area networks (“WAN”), the Internet (e.g., the Internet), and end-to-end networks (e.g., ad hoc end-to-end networks), as well as any currently known or developed in the future network of.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device; it may also exist independently without being assembled into the electronic device.
  • the computer-readable medium carries one or more programs.
  • the electronic device obtains multiple training sample sets; the training sample set includes training images and the The training recognition result corresponding to the training image, the data distribution of each training sample set is not completely consistent; for each training image, the gradient of the training image is determined based on the training image and the training recognition result corresponding to the training image.
  • the first statistic is used to characterize the mean vector corresponding to the training sample set, so The second statistic is used to characterize the covariance matrix corresponding to the training sample set; according to the first statistic and the second statistic, a statistic loss function is determined; according to the statistic loss function, the predetermined Let the model be updated to obtain the image recognition model.
  • the computer-readable medium carries one or more programs.
  • the electronic device obtains an image to be recognized; inputs the image to be recognized into a pre-trained An image recognition model is used to obtain the recognition result of the image to be recognized; wherein the image recognition model is trained by the above image recognition model training method.
  • Computer program code for performing the operations of the present disclosure may be written in one or more programming languages, including but not limited to object-oriented programming languages such as Java, Smalltalk, C++, or a combination thereof, Also included are conventional procedural programming languages - such as "C" or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as an Internet service provider). connected via the Internet).
  • LAN local area network
  • WAN wide area network
  • Internet service provider such as an Internet service provider
  • each block in the flowchart or block diagram may represent a module, segment, or portion of code that contains one or more logic functions that implement the specified executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown one after another may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved.
  • each block of the block diagram and/or flowchart illustration, and combinations of blocks in the block diagram and/or flowchart illustration can be implemented by special purpose hardware-based systems that perform the specified functions or operations. , or can be implemented using a combination of specialized hardware and computer instructions.
  • the modules involved in the embodiments of the present disclosure can be implemented in software or hardware.
  • the name of the module does not constitute a limitation on the module itself under certain circumstances.
  • the acquisition module can also be described as "a module for acquiring images to be recognized.”
  • FPGAs Field Programmable Gate Arrays
  • ASICs Application Specific Integrated Circuits
  • ASSPs Application Specific Standard Products
  • SOCs Systems on Chips
  • CPLD Complex Programmable Logical device
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device.
  • the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, laptop disks, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM portable compact disk read-only memory
  • magnetic storage device or any suitable combination of the above.
  • Example 1 provides a method for training an image recognition model.
  • the method includes: acquiring multiple training sample sets; the training sample set includes training images and corresponding training images.
  • the training recognition results, the data distribution of each training sample set is not completely consistent; for each training image, determine the gradient of the training image according to the training image and the training recognition result corresponding to the training image; according to each training image gradient of each of the training images, determine each The first statistic of the training sample set and the second statistic of each training sample set; the first statistic is used to characterize the mean vector corresponding to the training sample set, and the second statistic is used to Characterize the covariance matrix corresponding to the training sample set; determine a statistic loss function according to the first statistic and the second statistic; update the preset model according to the statistic loss function to obtain an image Identify the model.
  • Example 2 provides the method of Example 1, the preset model includes a feature extraction network and a classifier; and based on the training image and the training recognition result corresponding to the training image, determine The gradient of the training image includes: preprocessing the training image to obtain a preprocessed training image; inputting the preprocessed training image into the feature extraction network to obtain image features of the training image; Input the image features of the training image into the classifier to obtain the predicted recognition result of the training image; determine the gradient of the training image based on the predicted recognition result of the training image, the training recognition result and the image features.
  • Example 3 provides the method of Example 2, wherein preprocessing the training image to obtain a preprocessed training image includes: performing random data enhancement on the training image to obtain The preprocessed training image; the random data enhancement includes at least one of random scaling, random cropping, random flipping, and random color dithering.
  • Example 4 provides the method of Example 1, which determines a first statistic of each training sample set and each of the training sample sets according to the gradient of each training image.
  • the second statistic of the training sample set includes: determining the first statistic of the training sample set based on the gradient of all training images included in each training sample set; The gradient of the image and the first statistic of the training sample set determine the second statistic of the training sample set.
  • Example 5 provides the method of Example 1, the statistic loss function includes a first statistic loss function and a second statistic loss function; according to the first statistic loss function and the second statistic, determining a statistic loss function, including: based on the first statistic of each two training sample sets, determining the first statistic loss function corresponding to the two training sample sets. ; According to the second statistics of each two training sample sets, determine the second statistical loss function corresponding to the two training sample sets.
  • Example 6 provides the method of Example 5, which updates a preset model according to the statistical loss function to obtain an image recognition model, including: The first statistical loss function, the second statistical loss function corresponding to the training sample set, and the initial loss function of the preset model are minimized to obtain the image recognition model.
  • Example 7 provides an image recognition method.
  • the method includes: acquiring an image to be recognized; inputting the image to be recognized into a pre-trained image recognition model to obtain the image to be recognized.
  • the recognition result of the recognition image wherein the image recognition model is trained by the image recognition model training method described in any one of Examples 1 to 6.
  • Example 8 provides the method of Example 7, which inputs the image to be recognized into a pre-trained image recognition model to obtain a recognition result of the image to be recognized, including: When the image to be recognized is an endoscopic image, the endoscopic image is input to the image recognition model to obtain a recognition result indicating whether the endoscopic image is the ileocecal part.
  • Example 9 provides a training device for an image recognition model.
  • the training device for an image recognition model includes: a first acquisition module, configured to acquire multiple training sample sets;
  • the training sample set includes training images and training recognition results corresponding to the training images.
  • the data distribution of each training sample set is not completely consistent;
  • a determination module is used for each training image, according to the training image and the training image.
  • the training recognition result corresponding to the training image determines the gradient of the training image; the determination module is also used to determine each training sample set according to the gradient of each training image.
  • the first statistic and the second statistic of each training sample set is used to characterize the mean vector corresponding to the training sample set, and the second statistic is used to characterize the training
  • the covariance matrix corresponding to the sample set is also used to determine the statistic loss function based on the first statistic and the second statistic
  • the update module is used to determine the statistic loss function based on the statistic loss function, Update the preset model to obtain the image recognition model.
  • Example 10 provides an image recognition device.
  • the image recognition device includes: a second acquisition module for acquiring an image to be recognized; a processing module for converting the image to be recognized The image is input to a pre-trained image recognition model to obtain the recognition result of the image to be recognized; wherein the image recognition model is trained by the image recognition model training device described in Example 9.
  • Example 11 provides a computer-readable medium having a computer program stored thereon, which implements what is described in Examples 1 to 6 or Examples 7 to 8 when executed by a processing device. Describe the steps of the method.
  • Example 12 provides an electronic device, including: a storage device having a computer program stored thereon; and a processing device configured to execute the computer program in the storage device, to Implement the steps of the methods described in Examples 1 to 6 or Examples 7 to 8.
  • Example 13 provides a computer program product, including a computer program that, when executed by a processing device, implements the methods described in Examples 1 to 6 or Examples 7 to 8 A step of.
  • Example 11 provides a computer program that, when executed by a processing device, implements the steps of the method described in Examples 1 to 6 or Examples 7 to 8.
  • the present disclosure first obtains multiple training sample sets including training images and training recognition results, and then for each training image, determines the gradient of the training image based on the training image and the training recognition results corresponding to the training image. , and determine the first statistic of each training sample set and the second statistic of each training sample set based on the gradient of each training image, and finally determine the statistical loss function based on the first statistic and the second statistic. , and update the preset model according to the statistical loss function to obtain the image recognition model.
  • the present disclosure can determine a statistic loss function based on the first statistic and the second statistic, and update the preset model using the statistic loss function, so that the preset model can learn to have central invariance using training images of multiple training sample sets.

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

本公开涉及一种图像识别模型的训练方法和装置、识别方法和装置、介质、设备、计算机程序产品以及计算机程序,该方法包括:获取多个训练样本集,每个训练样本集的数据分布不完全一致;针对每个训练图像,根据该训练图像和该训练图像对应的训练识别结果,确定该训练图像的梯度;根据每个训练图像的梯度,确定每个训练样本集的第一统计量和第二统计量;根据第一统计量和第二统计量,确定统计量损失函数;根据统计量损失函数对预设模型进行更新,得到图像识别模型。本公开可以根据由第一统计量和第二统计量确定的统计量损失函数,更新预设模型,得到泛化性能高的图像识别模型,并且无需对图像识别模型进行额外的微调,能够避免造成过拟合问题,提高了图像识别模型的识别准确度。

Description

图像识别模型的训练方法、识别方法、装置、介质和设备
相关申请的交叉引用
本公开要求于2022年03月28日提交的申请号为202210309902.8、名称为“图像识别模型的训练方法、识别方法、装置、介质和设备”的中国专利申请的优先权,此申请的内容通过引用并入本文。
技术领域
本公开涉及图像处理技术领域,具体地,涉及一种图像识别模型的训练方法和装置、识别方法和装置、介质、设备、计算机程序产品以及计算机程序。
背景技术
结直肠癌是我国发病率最高的恶性肿瘤之一,但癌症的早期诊断和适当治疗可以带来约90%的治愈率。定期肠镜筛查可以识别腺瘤性息肉,并预防癌症。在内镜检查过程中,识别内镜图像中的回盲部至关重要。
当前,对内镜图像识别主要是基于深度神经网络(例如,卷积神经网络),为了取得良好的泛化性能,需要搜集大量的训练数据进行训练。而训练数据可能来自同一个医疗中心,也可能来自不同的医疗中心。然而,相关技术中的方法忽略了模型在新中心上的泛化问题,也没有重视多中心的训练数据中的额外知识。这会导致在每次部署模型到新的中心时,都需要收集新中心的数据来微调训练好的模型,以确保模型的泛化性能,否则会影响模型对内镜图像识别的准确度。并且,每次部署模型时微调训练好的模型过程复杂,同时可能会造成过拟合等问题,影响模型的识别准确度。
发明内容
提供该发明内容部分以便以简要的形式介绍构思,这些构思将在后面的具体实施方式部分被详细描述。该发明内容部分并不旨在标识要求保护的技术方案的关键特征或必要特征,也不旨在用于限制所要求的保护的技术方案的范围。
第一方面,本公开提供一种图像识别模型的训练方法,所述方法包括:
获取多个训练样本集;所述训练样本集包括训练图像以及所述训练图像对应的训练识别结果,每个所述训练样本集的数据分布不完全一致;
针对每个所述训练图像,根据该训练图像和该训练图像对应的训练识别结果,确定该训练图像的梯度;
根据每个所述训练图像的梯度,确定每个所述训练样本集的第一统计量和每个所述训练样本集的第二统计量;所述第一统计量用于表征所述训练样本集对应的均值向量,所述第二统计量用于表征所述训练样本集对应的协方差矩阵;
根据所述第一统计量和所述第二统计量,确定统计量损失函数;
根据所述统计量损失函数,对预设模型进行更新,得到图像识别模型。
第二方面,本公开提供一种图像识别方法,所述方法包括:
获取待识别图像;
将所述待识别图像输入预先训练好的图像识别模型,得到所述待识别图像的识别结果;其中,所述图像识别模型是通过第一方面所述的图像识别模型的训练方法训练得到的。
第三方面,本公开提供一种图像识别模型的训练装置,所述图像识别模型的训练装置包括:
第一获取模块,用于获取多个训练样本集;所述训练样本集包括训练图像以及所述训练图像对应的训练识别结果,每个所述训练样本集的数据分布不完全一致;
确定模块,用于针对每个所述训练图像,根据该训练图像和该训练图像对应的训练识别结果,确定该训练图像的梯度;
所述确定模块,还用于根据每个所述训练图像的梯度,确定每个所述训练样本集的第一统计量和每个所述训练样本集的第二统计量;所述第一统计量用于表征所述训练样本集对应的均值向量,所述第二统计量用于表征所述训练样本集对应的协方差矩阵;
所述确定模块,还用于根据所述第一统计量和所述第二统计量,确定统计量损失函数;
更新模块,用于根据所述统计量损失函数,对预设模型进行更新,得到图像识别模型。
第四方面,本公开提供一种图像识别装置,所述图像识别装置包括:
第二获取模块,用于获取待识别图像;
处理模块,用于将所述待识别图像输入预先训练好的图像识别模型,得到所述待识别图像的识别结果;其中,所述图像识别模型是通过第三方面所述的图像识别模型的训练装置训练得到的。
第五方面,本公开提供一种计算机可读介质,其上存储有计算机程序,该程序被处理装置执行时实现本公开第一方面或第二方面所述方法的步骤。
第六方面,本公开提供一种电子设备,包括:
存储装置,其上存储有计算机程序;
处理装置,用于执行所述存储装置中的所述计算机程序,以实现本公开第一方面或第二方面所述方法的步骤。
第七方面,本公开提供一种计算机程序产品,包括计算机程序,该计算机程序被处理装置执行时实现本公开第一方面或第二方面所述方法的步骤。
第八方面,本公开提供一种计算机程序,该计算机程序被处理装置执行时实现本公开第一方面或第二方面所述方法的步骤。
本公开的其他特征和优点将在随后的具体实施方式部分予以详细说明。
附图说明
结合附图并参考以下具体实施方式,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。贯穿附图中,相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的,原件和元素不一定按照比例绘制。在附图中:
图1是根据一示例性实施例示出的一种图像识别模型的训练方法的流程图;
图2是根据图1所示实施例示出的一种步骤102的流程图;
图3是根据图1所示实施例示出的一种步骤103的流程图;
图4是根据图1所示实施例示出的一种步骤104的流程图;
图5是根据一示例性实施例示出的一种图像识别方法的流程图;
图6是根据一示例性实施例示出的一种图像识别模型的训练装置的框图;
图7是根据图6所示实施例示出的一种确定模块的框图;
图8是根据一示例性实施例示出的一种图像识别装置的框图;
图9是根据一示例性实施例示出的一种电子设备的框图。
具体实施方式
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。
应当理解,本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。
图1是根据一示例性实施例示出的一种图像识别模型的训练方法的流程图。如图1所示,该方法可以包括以下步骤:
步骤101,获取多个训练样本集。其中,训练样本集包括训练图像以及训练图像对应的训练识别结果,每个训练样本集的数据分布不完全一致。
示例地,深度学习方法在图像识别中性能表现优异,然而这依赖于训练数据和测试数据分布的一致性。在训练用于图像识别的神经网络模型时,神经网络模型学习会在优化过程中寻找捷径,倾向于依赖训练数据中的简单特征。也就是说,神经网络模型在训练时会优先记住训练数据中简单的偏见信息。例如,在识别内镜图像中的回盲部的场景下,神经网络模型在训练时会优先记住影像采集设备的机器型号或者拍摄时的身体位置等具有简单特征的信息。然而,来自多个数据中心的训练数据的分布可能不是完全一致的(即存在着多中心数据分布偏移的问题),这会使得在测试中,由于新中心的测试数据具有不同的偏见信息而导致模型的泛化能力大幅下降。为了提高模型的泛化性能,可以使用多个数据中心的训练数据来训练一个图像识别模型,使该图像识别模型能够学习具有中心不变性的图像特征,并捕捉与图像识别相关的具有判别性的信息,同时降低对特定数据中心的数据分布的敏感性,从而确保图像识别模型在新数据中心上的泛化性能。
具体地,首先可以获取不同数据中心所收集的多个训练样本集,每个数据中心对应的一个训练样本集,每个训练样本集的数据分布不完全一致。其中,训练样本集可以包括训练图像以及训练图像对应的训练识别结果。以识别内镜图像中的回盲部为例,数据中心可以是医疗中心,训练图像可以是历史时间段内由医疗中心的影像采集设备在内镜检查时采集到的内镜图像,而训练识别结果可以是通过人工方式对该内镜图像标注的分类结果(例如,分类结果可以包括内镜图像为回盲部图像,以及内镜图像不为回盲部图像两种),同时不同医疗中心收集得到的训练样本集的数据分布不完全一致。也即,本公开中关于表述“每个训练样本集的数据分布不完全一致”,其可以 指的是不同训练样本集之间具有不完全一致的数据分布,换言之,对于任两个训练样本集,一个训练样本集的数据分布与另一训练样本集的数据分布不完全一致。
步骤102,针对每个训练图像,根据该训练图像和该训练图像对应的训练识别结果,确定该训练图像的梯度。
步骤103,根据每个训练图像的梯度,确定每个训练样本集的第一统计量和每个训练样本集的第二统计量。其中,第一统计量用于表征训练样本集对应的均值向量,第二统计量用于表征训练样本集对应的协方差矩阵。
举例来说,可以预先构建一个用于对图像进行识别的预设模型,并在获取到多个训练样本集后,分别将全部训练样本集中的每个训练图像输入到预设模型中,得到每个训练图像的预测识别结果。之后可以根据每个训练图像的预测识别结果和该训练图像的训练识别结果,计算每个训练图像的梯度。对于每个训练图像的梯度,可以理解为结合考虑了图像和训练识别结果得到的该训练图像的深度特征。
然后,可以针对每个训练样本集,根据该训练图像的梯度,计算用于表征该训练样本集对应均值向量的第一统计量,并根据该训练图像的梯度,计算用于表征训练样本集对应协方差矩阵的第二统计量。其中,训练样本集和测试集之间的数据分布偏移通常包括多样性偏移和相关性偏移。多样性偏移是指对模型训练和测试时的数据来自不同数据中心因而有不同的特征(例如两个医疗中心采用的影像采集设备不同,导致内镜成像的分辨率和颜色外观的差异),而相关性偏移则是指测试集上的数据之间的关联信息与训练样本集上的数据之间的关联信息不同。第一统计量实际上是用于衡量训练样本集对应的多样性偏移,而第二统计量则是用于衡量训练样本集对应的相关性偏移。
步骤104,根据第一统计量和所述第二统计量,确定统计量损失函数。
步骤105,根据统计量损失函数,对预设模型进行更新,得到图像识别模型。
举例来说,在确定训练样本集的第一统计量、第二统计量之后,可以根据每两个训练样本集的第一统计量和该两个训练样本集的第二统计量,确定每两个训练样本集对应的统计量损失函数。其中,统计量损失函数可以包括第一统计量损失函数和第二统计量损失函数,第一统计量损失函数用于表征两个训练样本集的第一统计量之间的差异,而第二统计量损失函数用于表征两个训练样本集的第二统计量之间的差异。然后,可以对每两个训练样本集对应的第一统计量损失函数、第二统计量损失函数,以及预设模型的初始损失函数这三个损失函数同时进行最小化处理,以更新预设模型的模型参数,从而得到图像识别模型。
需要说明的是,第一统计量可以为一阶统计量,而第二统计量可以为二阶统计量,一阶统计量和二阶统计量可以概括数据分布的大部分特征。因此,本公开在梯度空间上使用一阶统计量和二阶统计量,可以显式地度量两个数据中心之间的梯度分布距离,并最小化不同数据中心的数据的梯度分布差异,使不同数据中心的梯度分布尽可能的接近,以消除对不同数据中心的数据分布的依赖,从而强制模型在训练时从多个数据中心的数据中学习和捕获跨中心不变的判别信息(即具有中心不变性的图像特征),提升模型在新数据中心上的泛化能力。
综上所述,本公开首先获取多个包括训练图像和训练识别结果的训练样本集,再针对每个训练图像,根据该训练图像和该训练图像对应的训练识别结果,确定该训练图像的梯度,并根据每个训练图像的梯度,确定每个训练样本集的第一统计量和每个训练样本集的第二统计量,最后根据第一统计量和第二统计量,确定统计量损失函数,并根据统计量损失函数,对预设模型进行更新,得到图像识别模型。本公开可以根据第一统计量和第二统计量,确定统计量损失函数,并利 用统计量损失函数更新预设模型,使预设模型能够使用多个训练样本集的训练图像学习具有中心不变性的图像特征,并捕捉与图像识别相关的具有判别性的信息,忽略特定训练样本集的噪声,从而得到泛化性能高的图像识别模型,能够确保对待识别图像进行识别的准确度,并且无需对图像识别模型进行额外的微调,能够避免造成过拟合问题,提高了图像识别模型的识别准确度。
图2是根据图1所示实施例示出的一种步骤102的流程图。如图2所示,预设模型可以包括特征提取网络和分类器,步骤102可以包括以下步骤:
步骤1021,对该训练图像进行预处理,得到预处理后的训练图像。
示例地,在训练图像识别模型的过程中,还可以预先对每个训练图像进行预处理,例如,可以针对每个训练图像,对该训练图像进行随机数据增强,得到预处理后的训练图像。其中,随机数据增强可以包括随机缩放、随机裁剪、随机翻转(包括随机水平/垂直翻转)、随机颜色抖动(包括亮度、对比度等)中的至少一种。
步骤1022,将预处理后的训练图像输入到特征提取网络中,得到该训练图像的图像特征。
步骤1023,将该训练图像的图像特征输入到分类器,得到该训练图像的预测识别结果。
在一种场景中,预设模型可以包括特征提取网络fθ和分类器W。可以将某个训练样本集记为De={(x(i),y(i))|1≤i≤Ne},其中,e∈E,e为该训练样本集对应的数据中心,E为全部数据中心,x(i)是该训练样本集中的第i个训练图像,y(i)是x(i)对应的人工标注的训练识别结果(例如,可以采用one-hot分类标注),Ne为该训练样本集包括的训练图像的数量。在获取到多个训练样本集后,可以将每个训练图像对应的预处理后的训练图像输入到特征提取网络fθ中,得到每个训练图像的图像特征,可以将特征提取网络fθ提取的图像特征记为z(i)=fθ(x(i))。然后,可以将每个训练图像的图像特征输入到分类器W,得到每个训练图像的预测识别结果。同时可以使用softmax激活的全连接层做为分类器W,分类器W的参数w可以表示为w=[w1,w2,...,wC]∈RK×C,那么每个训练图像的预测识别结果,即分类器W预测的分类概率可以表示为其中,C为类别数,K为特征维数,σ为softmax操作。
步骤1024,根据该训练图像的预测识别结果、训练识别结果和图像特征,确定该训练图像的梯度。
在本步骤中,可以根据每个训练图像的预测识别结果、训练识别结果和图像特征,确定每个训练图像的梯度。对于来自数据中心的训练样本集的第i个训练图像x(i)的梯度,可以理解为在以x(i)和其对应的训练识别结果y(i)作为输入时,分类损失对分类器W的参数w的梯度,这个梯度就是在优化网络参数(梯度下降)时用到的梯度。例如,在分类器W为使用交叉熵损失函数(英文:cross entropy loss)的softmax分类器的情况下,对于来自数据中心e的第i个训练图像的梯度可以表示为:
图3是根据图1所示实施例示出的一种步骤103的流程图。如图3所示,步骤103可以包括以下步骤:
步骤1031,根据每个训练样本集包括的全部训练图像的梯度,确定该训练样本集的第一统计量。
步骤1032,根据每个训练样本集包括的全部训练图像的梯度以及该训练样本集的第一统计量,确定该训练样本集的第二统计量。
举例来说,对于来自数据中心e的训练样本集,可以将该训练样本集包括的全部训练图像的 梯度记为一个矩阵Ge={g(i)|1≤i≤Ne},为了方便表示,可以将每一个梯度拉成一个向量,那么矩阵此时该训练样本集的第一统计量可以表示为:即该训练样本集的第一统计量表示为一个长度为KC的向量。此时该训练样本集的第二统计量可以表示为:即该训练样本集的第二统计量表示为一个大小为KC×KC的矩阵。
图4是根据图1所示实施例示出的一种步骤104的流程图。如图4所示,统计量损失函数包括第一统计量损失函数和第二统计量损失函数,步骤104可以包括以下步骤:
步骤1041,根据每两个训练样本集的第一统计量,确定该两个训练样本集对应的第一统计量损失函数。
步骤1042,根据每两个训练样本集的第二统计量,确定该两个训练样本集对应的第二统计量损失函数。
示例地,在确定训练样本集的第一统计量、第二统计量之后,可以进一步地确定每两个训练样本集对应的第一统计量损失函数和每两个训练样本集对应的第二统计量损失函数。例如,对于来自两个不同数据中心e、f的训练样本集,其对应的第一统计量损失函数L1st可以表示为:其对应的第二统计量损失函数L2nd可以表示为:L2nd=∑e,f∈E,e≠f||cov(Ge)-cov(Gf)||。其中,||·||表示向量范数。
可选地,步骤105可以通过以下方式实现:
对每两个训练样本集对应的第一统计量损失函数、第二统计量损失函数,以及预设模型的初始损失函数进行最小化处理,得到图像识别模型。
举例来说,可以针对每两个训练样本集,将该两个训练样本集中的每个训练图像作为预设模型的输入,并将每个训练图像对应的训练识别结果作为预设模型的输出,来对预设模型进行训练。同时在对预设模型进行训练的过程中,对每两个训练样本集对应的第一统计量损失函数、第二统计量损失函数,以及预设模型的初始损失函数这三个损失函数同时进行最小化处理,以更新预设模型的模型参数,从而得到图像识别模型。
其中,对第一统计量损失函数、第二统计量损失函数和初始损失函数同时进行最小化处理,实际上可以等效为对一个目标损失函数进行最小化处理。例如,在预设模型包括特征提取网络和分类器的情况下,初始损失函数可以是分类器的分类损失函数,此时目标损失函数可以表示为:L=Lcls1stL1st2ndL2nd,Lcls为分类损失函数,λ1st和λ2nd可以理解为预先设定的超参数,用于平衡各项的比例。
需要说明的是,训练图像识别模型可以由PyTorch实现,训练参数可以设置为:1)学习率:5e-5,2)Batch size:256,优化器:AdamW,Epoch:第一个训练周期100,第二个训练周期20,输入图像大小:448x448。
综上所述,本公开首先获取多个包括训练图像和训练识别结果的训练样本集,再针对每个训练图像,根据该训练图像和该训练图像对应的训练识别结果,确定该训练图像的梯度,并根据每个训练图像的梯度,确定每个训练样本集的第一统计量和每个训练样本集的第二统计量,最后根 据第一统计量和第二统计量,确定统计量损失函数,并根据统计量损失函数,对预设模型进行更新,得到图像识别模型。本公开可以根据第一统计量和第二统计量,确定统计量损失函数,并利用统计量损失函数更新预设模型,使预设模型能够使用多个训练样本集的训练图像学习具有中心不变性的图像特征,并捕捉与图像识别相关的具有判别性的信息,忽略特定训练样本集的噪声,从而得到泛化性能高的图像识别模型,能够确保对待识别图像进行识别的准确度,并且无需对图像识别模型进行额外的微调,能够避免造成过拟合问题,提高了图像识别模型的识别准确度。
图5是根据一示例性实施例示出的一种图像识别方法的流程图。如图5所示,该方法可以包括以下步骤:
步骤201,获取待识别图像。
步骤202,将待识别图像输入预先训练好的图像识别模型,得到待识别图像的识别结果。其中,该图像识别模型是通过上述任一实施例所示的图像识别模型的训练方法训练得到的。
举例来说,在对图像识别模型训练完成后,可以将训练好的图像识别模型部署到指定数据中心进行使用。然后,可以获取指定数据中心采集到的待识别图像,并将该待识别图像输入训练好的图像识别模型中,得到图像识别模型输出的待识别图像的识别结果。以图像识别模型用于识别内镜图像中的回盲部为例,在待识别图像为内窥镜图像的情况下,可以将内窥镜图像输入到图像识别模型,得到用于指示内窥镜图像是否为回盲部的识别结果。
需要说明的是,本公开中的图像识别模型并不仅限于识别内镜图像中的回盲部,还可以应用于任一图像识别场景(例如识别图像中的人、物等),本公开对此不做具体限定。
综上所述,本公开首先获取待识别图像,并将待识别图像输入预先训练好的图像识别模型,得到待识别图像的识别结果。本公开通过利用预先训练好的高泛化性能的、高识别准确度的图像识别模型来进行图像识别,能够确保对待识别图像进行识别的准确度。
图6是根据一示例性实施例示出的一种图像识别模型的训练装置的框图。如图6所示,该图像识别模型的训练装置300包括:
第一获取模块301,用于获取多个训练样本集。其中,训练样本集包括训练图像以及训练图像对应的训练识别结果,每个训练样本集的数据分布不完全一致。
确定模块302,用于针对每个训练图像,根据该训练图像和该训练图像对应的训练识别结果,确定该训练图像的梯度。
确定模块302,还用于根据每个训练图像的梯度,确定每个训练样本集的第一统计量和每个训练样本集的第二统计量。其中,第一统计量用于表征训练样本集对应的均值向量,第二统计量用于表征训练样本集对应的协方差矩阵。
确定模块302,还用于根据第一统计量和第二统计量,确定统计量损失函数。
更新模块303,用于根据统计量损失函数,对预设模型进行更新,得到图像识别模型。
图7是根据图6所示实施例示出的一种确定模块的框图。预设模型包括特征提取网络和分类器,如图7所示,确定模块302包括:
处理子模块3021,用于对该训练图像进行预处理,得到预处理后的训练图像。
提取子模块3022,用于将预处理后的训练图像输入到特征提取网络中,得到该训练图像的图像特征。
分类子模块3023,用于将该训练图像的图像特征输入到分类器,得到该训练图像的预测识别结果。
梯度确定子模块3024,用于根据该训练图像的预测识别结果、训练识别结果和图像特征,确 定该训练图像的梯度。
可选地,处理子模块3021用于:
对该训练图像进行随机数据增强,得到预处理后的训练图像。其中,随机数据增强包括随机缩放、随机裁剪、随机翻转、随机颜色抖动中的至少一种。
可选地,确定模块302用于:
根据每个训练样本集包括的全部训练图像的梯度,确定该训练样本集的第一统计量。
根据每个训练样本集包括的全部训练图像的梯度以及该训练样本集的第一统计量,确定该训练样本集的第二统计量。
可选地,统计量损失函数包括第一统计量损失函数和第二统计量损失函数。确定模块302用于:
根据每两个训练样本集的第一统计量,确定该两个训练样本集对应的第一统计量损失函数。
根据每两个训练样本集的第二统计量,确定该两个训练样本集对应的第二统计量损失函数。
可选地,处理模块302用于对每两个训练样本集对应的第一统计量损失函数、第二统计量损失函数,以及预设模型的初始损失函数进行最小化处理,得到图像识别模型。
图8是根据一示例性实施例示出的一种图像识别装置的框图。如图8所示,该图像识别装置400包括:
第二获取模块401,用于获取待识别图像。
处理模块402,用于将待识别图像输入预先训练好的图像识别模型,得到待识别图像的识别结果。其中,图像识别模型是通过上述图像识别模型的训练装置300训练得到的。
可选地,处理模块402,用于在待识别图像为内窥镜图像的情况下,将内窥镜图像输入到图像识别模型,得到用于指示内窥镜图像是否为回盲部的识别结果。
综上所述,本公开首先获取待识别图像,并将待识别图像输入图像识别模型,得到待识别图像的识别结果,其中,图像识别模型是通过以下方式训练得到的:获取多个包括训练图像和训练识别结果的训练样本集,再针对每个训练图像,根据该训练图像和该训练图像对应的训练识别结果,确定该训练图像的梯度,并根据每个训练图像的梯度,确定每个训练样本集的第一统计量和每个训练样本集的第二统计量,最后根据第一统计量和第二统计量,确定统计量损失函数,并根据统计量损失函数,对预设模型进行更新,得到图像识别模型。本公开可以根据第一统计量和第二统计量,确定统计量损失函数,并利用统计量损失函数更新预设模型,使预设模型能够使用多个训练样本集的训练图像学习具有中心不变性的图像特征,并捕捉与图像识别相关的具有判别性的信息,忽略特定训练样本集的噪声,从而得到泛化性能高的图像识别模型,能够确保对待识别图像进行识别的准确度,并且无需对图像识别模型进行额外的微调,避免造成过拟合问题,提高图像识别模型的识别准确度。
下面参考图9,其示出了适于用来实现本公开实施例的电子设备(其例如可以是上述实施例中的执行主体,可以为终端设备或服务器)600的结构示意图。本公开实施例中的终端设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图9示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图9所示,电子设备600可以包括处理装置(例如中央处理器、图形处理器等)601,其可以根据存储在只读存储器(ROM)602中的程序或者从存储装置608加载到随机访问存储器(RAM) 603中的程序而执行各种适当的动作和处理。在RAM 603中,还存储有电子设备600操作所需的各种程序和数据。处理装置601、ROM 602以及RAM 603通过总线604彼此相连。输入/输出(I/O)接口605也连接至总线604。
通常,以下装置可以连接至I/O接口605:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置606;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置607;包括例如磁带、硬盘等的存储装置608;以及通信装置609。通信装置609可以允许电子设备600与其他设备进行无线或有线通信以交换数据。虽然图9示出了具有各种装置的电子设备600,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置609从网络上被下载和安装,或者从存储装置608被安装,或者从ROM 602被安装。在该计算机程序被处理装置601执行时,执行本公开实施例的方法中限定的上述功能。
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。
在一些实施方式中,客户端、服务器可以利用诸如HTTP(HyperText Transfer Protocol,超文本传输协议)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(“LAN”),广域网(“WAN”),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:获取多个训练样本集;所述训练样本集包括训练图像以及所述训练图像对应的训练识别结果,每个所述训练样本集的数据分布不完全一致;针对每个所述训练图像,根据该训练图像和该训练图像对应的训练识别结果,确定该训练图像的梯度;根据每个所述训练图像 的梯度,确定每个所述训练样本集的第一统计量和每个所述训练样本集的第二统计量;所述第一统计量用于表征所述训练样本集对应的均值向量,所述第二统计量用于表征所述训练样本集对应的协方差矩阵;根据所述第一统计量和所述第二统计量,确定统计量损失函数;根据所述统计量损失函数,对预设模型进行更新,得到图像识别模型。
或者,上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:获取待识别图像;将所述待识别图像输入预先训练好的图像识别模型,得到所述待识别图像的识别结果;其中,所述图像识别模型是通过上述图像识别模型的训练方法训练得到的。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言——诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言——诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)——连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的模块可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,模块的名称在某种情况下并不构成对该模块本身的限定,例如,获取模块还可以被描述为“获取待识别图像的模块”。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
根据本公开的一个或多个实施例,示例1提供了一种图像识别模型的训练方法,所述方法包括:获取多个训练样本集;所述训练样本集包括训练图像以及所述训练图像对应的训练识别结果,每个所述训练样本集的数据分布不完全一致;针对每个所述训练图像,根据该训练图像和该训练图像对应的训练识别结果,确定该训练图像的梯度;根据每个所述训练图像的梯度,确定每个所 述训练样本集的第一统计量和每个所述训练样本集的第二统计量;所述第一统计量用于表征所述训练样本集对应的均值向量,所述第二统计量用于表征所述训练样本集对应的协方差矩阵;根据所述第一统计量和所述第二统计量,确定统计量损失函数;根据所述统计量损失函数,对预设模型进行更新,得到图像识别模型。
根据本公开的一个或多个实施例,示例2提供了示例1的方法,所述预设模型包括特征提取网络和分类器;所述根据该训练图像和该训练图像对应的训练识别结果,确定该训练图像的梯度,包括:对该训练图像进行预处理,得到预处理后的训练图像;将所述预处理后的训练图像输入到所述特征提取网络中,得到该训练图像的图像特征;将该训练图像的图像特征输入到所述分类器,得到该训练图像的预测识别结果;根据该训练图像的预测识别结果、训练识别结果和图像特征,确定该训练图像的梯度。
根据本公开的一个或多个实施例,示例3提供了示例2的方法,所述对该训练图像进行预处理,得到预处理后的训练图像,包括:对该训练图像进行随机数据增强,得到所述预处理后的训练图像;所述随机数据增强包括随机缩放、随机裁剪、随机翻转、随机颜色抖动中的至少一种。
根据本公开的一个或多个实施例,示例4提供了示例1的方法,所述根据每个所述训练图像的梯度,确定每个所述训练样本集的第一统计量和每个所述训练样本集的第二统计量,包括:根据每个所述训练样本集包括的全部训练图像的梯度,确定该训练样本集的第一统计量;根据每个所述训练样本集包括的全部训练图像的梯度以及该训练样本集的第一统计量,确定该训练样本集的第二统计量。
根据本公开的一个或多个实施例,示例5提供了示例1的方法,所述统计量损失函数包括第一统计量损失函数和第二统计量损失函数;所述根据所述第一统计量和所述第二统计量,确定统计量损失函数,包括:根据每两个所述训练样本集的第一统计量,确定该两个所述训练样本集对应的所述第一统计量损失函数;根据每两个所述训练样本集的第二统计量,确定该两个所述训练样本集对应的所述第二统计量损失函数。
根据本公开的一个或多个实施例,示例6提供了示例5的方法,所述根据所述统计量损失函数,对预设模型进行更新,得到图像识别模型,包括:对每两个所述训练样本集对应的所述第一统计量损失函数、所述第二统计量损失函数,以及所述预设模型的初始损失函数进行最小化处理,得到所述图像识别模型。
根据本公开的一个或多个实施例,示例7提供了一种图像识别方法,所述方法包括:获取待识别图像;将所述待识别图像输入预先训练好的图像识别模型,得到所述待识别图像的识别结果;其中,所述图像识别模型是通过示例1至示例6中任一所述的图像识别模型的训练方法训练得到的。
根据本公开的一个或多个实施例,示例8提供了示例7的方法,所述将所述待识别图像输入预先训练好的图像识别模型,得到所述待识别图像的识别结果,包括:在所述待识别图像为内窥镜图像的情况下,将所述内窥镜图像输入到所述图像识别模型,得到用于指示所述内窥镜图像是否为回盲部的识别结果。
根据本公开的一个或多个实施例,示例9提供了一种图像识别模型的训练装置,所述图像识别模型的训练装置包括:第一获取模块,用于获取多个训练样本集;所述训练样本集包括训练图像以及所述训练图像对应的训练识别结果,每个所述训练样本集的数据分布不完全一致;确定模块,用于针对每个所述训练图像,根据该训练图像和该训练图像对应的训练识别结果,确定该训练图像的梯度;所述确定模块,还用于根据每个所述训练图像的梯度,确定每个所述训练样本集 的第一统计量和每个所述训练样本集的第二统计量;所述第一统计量用于表征所述训练样本集对应的均值向量,所述第二统计量用于表征所述训练样本集对应的协方差矩阵;所述确定模块,还用于根据所述第一统计量和所述第二统计量,确定统计量损失函数;更新模块,用于根据所述统计量损失函数,对预设模型进行更新,得到图像识别模型。
根据本公开的一个或多个实施例,示例10提供了一种图像识别装置,所述图像识别装置包括:第二获取模块,用于获取待识别图像;处理模块,用于将所述待识别图像输入预先训练好的图像识别模型,得到所述待识别图像的识别结果;其中,所述图像识别模型是通过示例9所述的图像识别模型的训练装置训练得到的。
根据本公开的一个或多个实施例,示例11提供了一种计算机可读介质,其上存储有计算机程序,该程序被处理装置执行时实现示例1至示例6或示例7至示例8中所述方法的步骤。
根据本公开的一个或多个实施例,示例12提供了一种电子设备,包括:存储装置,其上存储有计算机程序;处理装置,用于执行所述存储装置中的所述计算机程序,以实现示例1至示例6或示例7至示例8中所述方法的步骤。
根据本公开的一个或多个实施例,示例13提供了一种计算机程序产品,包括计算机程序,所述计算机程序被处理装置执行时实现示例1至示例6或示例7至示例8中所述方法的步骤。
根据本公开的一个或多个实施例,示例11提供了一种计算机程序,所述计算机程序被处理装置执行时实现示例1至示例6或示例7至示例8中所述方法的步骤。
通过上述技术方案,本公开首先获取多个包括训练图像和训练识别结果的训练样本集,再针对每个训练图像,根据该训练图像和该训练图像对应的训练识别结果,确定该训练图像的梯度,并根据每个训练图像的梯度,确定每个训练样本集的第一统计量和每个训练样本集的第二统计量,最后根据第一统计量和第二统计量,确定统计量损失函数,并根据统计量损失函数,对预设模型进行更新,得到图像识别模型。本公开可以根据第一统计量和第二统计量,确定统计量损失函数,并利用统计量损失函数更新预设模型,使预设模型能够使用多个训练样本集的训练图像学习具有中心不变性的图像特征,并捕捉与图像识别相关的具有判别性的信息,忽略特定训练样本集的噪声,从而得到泛化性能高的图像识别模型,能够确保对待识别图像进行识别的准确度,并且无需对图像识别模型进行额外的微调,能够避免造成过拟合问题,提高了图像识别模型的识别准确度。
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。

Claims (14)

  1. 一种图像识别模型的训练方法,其中,所述方法包括:
    获取多个训练样本集;所述训练样本集包括训练图像以及所述训练图像对应的训练识别结果,每个所述训练样本集的数据分布不完全一致;
    针对每个所述训练图像,根据该训练图像和该训练图像对应的训练识别结果,确定该训练图像的梯度;
    根据每个所述训练图像的梯度,确定每个所述训练样本集的第一统计量和每个所述训练样本集的第二统计量;所述第一统计量用于表征所述训练样本集对应的均值向量,所述第二统计量用于表征所述训练样本集对应的协方差矩阵;
    根据所述第一统计量和所述第二统计量,确定统计量损失函数;
    根据所述统计量损失函数,对预设模型进行更新,得到图像识别模型。
  2. 根据权利要求1所述的方法,其中,所述预设模型包括特征提取网络和分类器;所述根据该训练图像和该训练图像对应的训练识别结果,确定该训练图像的梯度,包括:
    对该训练图像进行预处理,得到预处理后的训练图像;
    将所述预处理后的训练图像输入到所述特征提取网络中,得到该训练图像的图像特征;
    将该训练图像的图像特征输入到所述分类器,得到该训练图像的预测识别结果;
    根据该训练图像的预测识别结果、训练识别结果和图像特征,确定该训练图像的梯度。
  3. 根据权利要求2所述的方法,其中,所述对该训练图像进行预处理,得到预处理后的训练图像,包括:
    对该训练图像进行随机数据增强,得到所述预处理后的训练图像;所述随机数据增强包括随机缩放、随机裁剪、随机翻转、随机颜色抖动中的至少一种。
  4. 根据权利要求1-3中任一项所述的方法,其中,所述根据每个所述训练图像的梯度,确定每个所述训练样本集的第一统计量和每个所述训练样本集的第二统计量,包括:
    根据每个所述训练样本集包括的全部训练图像的梯度,确定该训练样本集的第一统计量;
    根据每个所述训练样本集包括的全部训练图像的梯度以及该训练样本集的第一统计量,确定该训练样本集的第二统计量。
  5. 根据权利要求1-4中任一项所述的方法,其中,所述统计量损失函数包括第一统计量损失函数和第二统计量损失函数;所述根据所述第一统计量和所述第二统计量,确定统计量损失函数,包括:
    根据每两个所述训练样本集的第一统计量,确定该两个所述训练样本集对应的所述第一统计量损失函数;
    根据每两个所述训练样本集的第二统计量,确定该两个所述训练样本集对应的所述第二统计量损失函数。
  6. 根据权利要求5所述的方法,其中,所述根据所述统计量损失函数,对预设模型进行更新,得到图像识别模型,包括:
    对每两个所述训练样本集对应的所述第一统计量损失函数、所述第二统计量损失函数,以及所述预设模型的初始损失函数进行最小化处理,得到所述图像识别模型。
  7. 一种图像识别方法,其中,所述方法包括:
    获取待识别图像;
    将所述待识别图像输入预先训练好的图像识别模型,得到所述待识别图像的识别结果;其中,所述图像识别模型是通过权利要求1-6中任一项所述的图像识别模型的训练方法训练得到的。
  8. 根据权利要求7所述的方法,其中,所述将所述待识别图像输入预先训练好的图像识别模型,得到所述待识别图像的识别结果,包括:
    在所述待识别图像为内窥镜图像的情况下,将所述内窥镜图像输入到所述图像识别模型,得到用于指示所述内窥镜图像是否为回盲部的识别结果。
  9. 一种图像识别模型的训练装置,其中,所述图像识别模型的训练装置包括:
    第一获取模块,用于获取多个训练样本集;所述训练样本集包括训练图像以及所述训练图像对应的训练识别结果,每个所述训练样本集的数据分布不完全一致;
    确定模块,用于针对每个所述训练图像,根据该训练图像和该训练图像对应的训练识别结果,确定该训练图像的梯度;
    所述确定模块,还用于根据每个所述训练图像的梯度,确定每个所述训练样本集的第一统计量和每个所述训练样本集的第二统计量;所述第一统计量用于表征所述训练样本集对应的均值向量,所述第二统计量用于表征所述训练样本集对应的协方差矩阵;
    所述确定模块,还用于根据所述第一统计量和所述第二统计量,确定统计量损失函数;
    更新模块,用于根据所述统计量损失函数,对预设模型进行更新,得到图像识别模型。
  10. 一种图像识别装置,其中,所述图像识别装置包括:
    第二获取模块,用于获取待识别图像;
    处理模块,用于将所述待识别图像输入预先训练好的图像识别模型,得到所述待识别图像的识别结果;其中,所述图像识别模型是通过权利要求9所述的图像识别模型的训练装置训练得到的。
  11. 一种计算机可读介质,其上存储有计算机程序,其中,该程序被处理装置执行时实现权利要求1-6或7-8中任一项所述方法的步骤。
  12. 一种电子设备,包括:
    存储装置,其上存储有计算机程序;
    处理装置,用于执行所述存储装置中的所述计算机程序,以实现权利要求1-6或7-8中任一项所述方法的步骤。
  13. 一种计算机程序产品,包括计算机程序,所述计算机程序被处理装置执行时实现权利要求1-6或7-8中任一项所述方法的步骤。
  14. 一种计算机程序,所述计算机程序被处理装置执行时实现权利要求1-6或7-8中任一项所述方法的步骤。
PCT/CN2023/082355 2022-03-28 2023-03-17 图像识别模型的训练方法、识别方法、装置、介质和设备 WO2023185516A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210309902.8A CN114419400B (zh) 2022-03-28 2022-03-28 图像识别模型的训练方法、识别方法、装置、介质和设备
CN202210309902.8 2022-03-28

Publications (1)

Publication Number Publication Date
WO2023185516A1 true WO2023185516A1 (zh) 2023-10-05

Family

ID=81264319

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/082355 WO2023185516A1 (zh) 2022-03-28 2023-03-17 图像识别模型的训练方法、识别方法、装置、介质和设备

Country Status (2)

Country Link
CN (1) CN114419400B (zh)
WO (1) WO2023185516A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114419400B (zh) * 2022-03-28 2022-07-29 北京字节跳动网络技术有限公司 图像识别模型的训练方法、识别方法、装置、介质和设备
CN116051486B (zh) * 2022-12-29 2024-07-02 抖音视界有限公司 内窥镜图像识别模型的训练方法、图像识别方法及装置

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160210535A1 (en) * 2015-01-16 2016-07-21 Canon Kabushiki Kaisha Image processing apparatus, image processing method, program, and storage medium
CN111476309A (zh) * 2020-04-13 2020-07-31 北京字节跳动网络技术有限公司 图像处理方法、模型训练方法、装置、设备及可读介质
CN111695209A (zh) * 2020-05-13 2020-09-22 东南大学 元深度学习驱动的旋转机械小样本健康评估方法
CN112801054A (zh) * 2021-04-01 2021-05-14 腾讯科技(深圳)有限公司 人脸识别模型的处理方法、人脸识别方法及装置
CN113505820A (zh) * 2021-06-23 2021-10-15 北京阅视智能技术有限责任公司 图像识别模型训练方法、装置、设备及介质
CN113706526A (zh) * 2021-10-26 2021-11-26 北京字节跳动网络技术有限公司 内窥镜图像特征学习模型、分类模型的训练方法和装置
CN114240867A (zh) * 2021-12-09 2022-03-25 小荷医疗器械(海南)有限公司 内窥镜图像识别模型的训练方法、内窥镜图像识别方法及装置
CN114419400A (zh) * 2022-03-28 2022-04-29 北京字节跳动网络技术有限公司 图像识别模型的训练方法、识别方法、装置、介质和设备

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN204613978U (zh) * 2011-10-03 2015-09-02 阿沃森特亨茨维尔公司 用于管理预定环境中的资产的系统
CN112749663B (zh) * 2021-01-15 2023-07-07 金陵科技学院 基于物联网和ccnn模型的农业果实成熟度检测系统
CN113268833B (zh) * 2021-06-07 2023-07-04 重庆大学 一种基于深度联合分布对齐的迁移故障诊断方法

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160210535A1 (en) * 2015-01-16 2016-07-21 Canon Kabushiki Kaisha Image processing apparatus, image processing method, program, and storage medium
CN111476309A (zh) * 2020-04-13 2020-07-31 北京字节跳动网络技术有限公司 图像处理方法、模型训练方法、装置、设备及可读介质
CN111695209A (zh) * 2020-05-13 2020-09-22 东南大学 元深度学习驱动的旋转机械小样本健康评估方法
CN112801054A (zh) * 2021-04-01 2021-05-14 腾讯科技(深圳)有限公司 人脸识别模型的处理方法、人脸识别方法及装置
CN113505820A (zh) * 2021-06-23 2021-10-15 北京阅视智能技术有限责任公司 图像识别模型训练方法、装置、设备及介质
CN113706526A (zh) * 2021-10-26 2021-11-26 北京字节跳动网络技术有限公司 内窥镜图像特征学习模型、分类模型的训练方法和装置
CN114240867A (zh) * 2021-12-09 2022-03-25 小荷医疗器械(海南)有限公司 内窥镜图像识别模型的训练方法、内窥镜图像识别方法及装置
CN114419400A (zh) * 2022-03-28 2022-04-29 北京字节跳动网络技术有限公司 图像识别模型的训练方法、识别方法、装置、介质和设备

Also Published As

Publication number Publication date
CN114419400B (zh) 2022-07-29
CN114419400A (zh) 2022-04-29

Similar Documents

Publication Publication Date Title
CN111476309B (zh) 图像处理方法、模型训练方法、装置、设备及可读介质
WO2020155907A1 (zh) 用于生成漫画风格转换模型的方法和装置
WO2023185516A1 (zh) 图像识别模型的训练方法、识别方法、装置、介质和设备
CN109800732B (zh) 用于生成漫画头像生成模型的方法和装置
WO2023030298A1 (zh) 息肉分型方法、模型训练方法及相关装置
WO2023284416A1 (zh) 数据处理方法及设备
WO2023061080A1 (zh) 组织图像的识别方法、装置、可读介质和电子设备
WO2023207564A1 (zh) 基于图像识别的内窥镜进退镜时间确定方法及装置
WO2023143016A1 (zh) 特征提取模型的生成方法、图像特征提取方法和装置
WO2023030523A1 (zh) 用于内窥镜的组织腔体定位方法、装置、介质及设备
CN112766284B (zh) 图像识别方法和装置、存储介质和电子设备
WO2023030427A1 (zh) 生成模型的训练方法、息肉识别方法、装置、介质及设备
CN113033580B (zh) 图像处理方法、装置、存储介质及电子设备
WO2022171036A1 (zh) 视频目标追踪方法、视频目标追踪装置、存储介质及电子设备
WO2023125008A1 (zh) 基于人工智能的内窥镜图像处理方法、装置、介质及设备
CN113449070A (zh) 多模态数据检索方法、装置、介质及电子设备
CN114240867A (zh) 内窥镜图像识别模型的训练方法、内窥镜图像识别方法及装置
WO2023130925A1 (zh) 字体识别方法、装置、可读介质及电子设备
WO2023185497A1 (zh) 组织图像的识别方法、装置、可读介质和电子设备
WO2023174075A1 (zh) 内容检测模型的训练方法、内容检测方法及装置
WO2023030426A1 (zh) 息肉识别方法、装置、介质及设备
WO2023016290A1 (zh) 视频分类方法、装置、可读介质和电子设备
US20230035995A1 (en) Method, apparatus and storage medium for object attribute classification model training
CN111797931B (zh) 图像处理方法、图像处理网络训练方法、装置、设备
CN115375656A (zh) 息肉分割模型的训练方法、分割方法、装置、介质及设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23777890

Country of ref document: EP

Kind code of ref document: A1