WO2022193973A1 - 图像处理方法、装置、电子设备、计算机可读存储介质及计算机程序产品 - Google Patents

图像处理方法、装置、电子设备、计算机可读存储介质及计算机程序产品 Download PDF

Info

Publication number
WO2022193973A1
WO2022193973A1 PCT/CN2022/079496 CN2022079496W WO2022193973A1 WO 2022193973 A1 WO2022193973 A1 WO 2022193973A1 CN 2022079496 W CN2022079496 W CN 2022079496W WO 2022193973 A1 WO2022193973 A1 WO 2022193973A1
Authority
WO
WIPO (PCT)
Prior art keywords
images
classification
image
loss value
pseudo
Prior art date
Application number
PCT/CN2022/079496
Other languages
English (en)
French (fr)
Inventor
柳露艳
马锴
郑冶枫
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2022193973A1 publication Critical patent/WO2022193973A1/zh
Priority to US18/071,106 priority Critical patent/US20230097391A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/776Validation; Performance evaluation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/03Recognition of patterns in medical or anatomical images

Definitions

  • the present application relates to the technical field of artificial intelligence, and in particular, to an image processing method, apparatus, electronic device, computer-readable storage medium, and computer program product.
  • Artificial intelligence is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can respond in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • an image classification model for classifying medical images can be trained, so as to determine the attribute types of medical images through the image classification model.
  • the annotation data used in the image processing method in the related art is obtained based on manual annotation.
  • the manual labeling process consumes a lot of labor costs, is inefficient and prone to errors.
  • Embodiments of the present application provide an image processing method, an apparatus, an electronic device, a computer-readable storage medium, and a computer program product, which can improve the efficiency and accuracy of image processing.
  • the technical solution is as follows:
  • An embodiment of the present application provides an image processing method, the method is executed by an electronic device, and the method includes: classifying at least two first images and at least two second images by using an image classification model to obtain at least two first images and at least two second images.
  • the predicted classification results of two first images and at least two second images, the first images carry corresponding classification labels, the second images carry corresponding pseudo classification labels, the first images and the
  • the second image is an image including the object to be recognized, and the predicted classification result, the classification label and the pseudo classification label are used to indicate the attribute type of the object to be recognized; the prediction based on the at least two first images
  • the classification result and the corresponding classification label, as well as the predicted classification result of the at least two second images and the corresponding pseudo-classification label obtain a first loss value, and the first loss value is used to indicate the accuracy of the predicted classification result;
  • Based on the predicted classification results of the at least two second images and the corresponding pseudo-classification labels a second loss value is obtained, where the second loss value is used to indicate
  • An embodiment of the present application provides an image processing apparatus, the apparatus includes: an acquisition module configured to perform classification processing on at least two first images and at least two second images through an image classification model, to obtain at least two first images and at least two second images.
  • the predicted classification results of an image and at least two second images, the first image carrying the corresponding classification label, the second image carrying the corresponding pseudo classification label, the first image and the second image is an image including the object to be recognized, the predicted classification result, the classification label and the pseudo classification label are used to indicate the attribute type of the object to be recognized;
  • the obtaining module is further configured to be based on the at least two The predicted classification result of the first image and the corresponding classification label, and the predicted classification result of the at least two second images and the corresponding pseudo classification label, to obtain a first loss value, where the first loss value is used to indicate the predicted classification The accuracy of the result;
  • the obtaining module is further configured to obtain a second loss value based on the predicted classification results of the at least two second images and the corresponding pseudo-classification
  • An embodiment of the present application provides an electronic device, the electronic device includes one or more processors and one or more memories, and the one or more memories store at least one computer program, and the at least one computer program Loaded and executed by the one or more processors to implement various optional implementations of the above image processing methods.
  • An embodiment of the present application provides a computer-readable storage medium, where at least one computer program is stored in the storage medium, and the at least one computer program is loaded and executed by a processor to realize various optional implementations of the above image processing method Way.
  • Embodiments of the present application provide a computer program product or computer program, where the computer program product or the computer program includes one or more pieces of program codes, and the one or more pieces of program codes are stored in a computer-readable storage medium.
  • One or more processors of the electronic device can read the one or more program codes from a computer-readable storage medium, the one or more processors execute the one or more program codes to enable the electronic device to The image processing method of any one of the possible implementation manners described above is performed.
  • a pseudo-classification label is used to label the second image when an image classification model is trained, and a corresponding loss value is designed based on the pseudo-classification label to indicate whether it is accurate.
  • the model parameters are updated, and the update process
  • the pseudo-classification labels in the middle will also be updated accordingly, so that only some images need to have classification labels, and pseudo-classification labels are generated for other images during the model training process, that is, all images do not need to have classification labels, which can greatly reduce manual labor.
  • the labor cost brought by the labeling improves the training efficiency.
  • the pseudo-classification label is continuously updated during the model training process, and finally a pseudo-classification label with the same accuracy as the classification label can be determined, so as to increase the number of training samples, thereby improving the accuracy of the image classification model.
  • FIG. 1 is a schematic diagram of an implementation environment of an image processing method provided by an embodiment of the present application.
  • FIG. 2 is a flowchart of an image processing method provided by an embodiment of the present application.
  • FIG. 3 is a flowchart of an image processing method provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a training system framework provided by an embodiment of the present application.
  • FIG. 5 is a flowchart of an image processing method provided by an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of an image processing apparatus provided by an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 8 is a structural block diagram of a terminal provided by an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of a server provided by an embodiment of the present application.
  • first, second and other words are used to distinguish the same or similar items with basically the same function and function, and it should be understood that between “first”, “second” and “nth” There are no logical or timing dependencies, and no restrictions on the number and execution order. It will also be understood that, although the following description uses the terms first, second, etc. to describe various elements, these elements should not be limited by the terms. These terms are only used to distinguish one element from another. For example, a first image could be termed a second image, and, similarly, a second image could be termed a first image, without departing from the scope of the various examples. Both the first image and the second image can be images, and in some cases, can be separate and distinct images.
  • the size of the sequence number of each process does not mean the sequence of execution, and the execution sequence of each process should be determined by its function and internal logic, and should not be used in the embodiment of the present application. Implementation constitutes any limitation.
  • determining B based on A does not mean determining B based only on A, but can also determine B based on A and/or other information.
  • the approximate classification label of the data without the classified label is given as the pseudo-classified label. That is, for data without classification labels (such as medical images), according to other data with classification labels, a classification label is determined for the data without classification labels. It is calculated by some technical means, so it can be called a pseudo-classification label.
  • the pseudo-classification labeling algorithm is a kind of self-learning method, which is also widely used in various classification tasks in the field of computer vision.
  • Medical images are images related to lesions or diseased areas acquired by radiomics and computed tomography. By processing medical images, the attribute types in them can be analyzed. For example, it can be analyzed whether there are lesions, where the lesions are located, the type of lesions, etc.
  • Lesion A diseased part of the body, a limited diseased tissue with pathogenic microorganisms. For example, a certain part of the lung is destroyed by tuberculosis bacteria, and this part is the pulmonary tuberculosis focus.
  • the otoscope data can be analyzed to determine whether there is a lesion in the tympanic membrane and the type of the lesion.
  • the image classification model has the ability to process the image, instead of manual calculation, to obtain the type of the object to be recognized in the image.
  • Image acquisition of disease-related parts of the human body for example, acquisition of parts in the ear to obtain ear images, which can also be referred to as otoscope data.
  • Image processing of the ear image can determine the location of the eardrum in the ear image, and analyze the type of the eardrum, such as normal eardrum, pathological eardrum, etc.
  • Pathological eardrum may also include tympanic sclerosis, tympanitis etc., but the annotation data used in image processing is obtained based on manual annotation.
  • the manual annotation process consumes a lot of labor costs, is inefficient and prone to errors. .
  • Embodiments of the present application provide an artificial intelligence-based image processing method, device, electronic device, and computer-readable storage medium, which can use pseudo-classification labels to label a second image when training an image classification model, and can use pseudo-classification labels to label a second image during model training. Generating pseudo-classification labels for the second image can greatly reduce labor costs caused by manual labeling and improve training efficiency.
  • the present application relates to artificial intelligence technology.
  • the image classification model has the ability to process images of human tissue, instead of manual calculation.
  • Artificial intelligence technology is a comprehensive discipline, involving a wide range of fields, including both hardware-level technology and software-level technology.
  • the basic technologies of artificial intelligence generally include technologies such as sensors, special artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics.
  • Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
  • Computer Vision Technology (CV, Computer Vision) Computer vision is a science that studies how to make machines "see”. Further, it refers to the use of cameras and computers instead of human eyes to identify, track and measure targets. Machine vision, And further do graphics processing, so that computer processing becomes more suitable for human eye observation or transmission to the instrument detection image. As a scientific discipline, computer vision studies related theories and technologies, trying to build artificial intelligence systems that can obtain information from images or multidimensional data.
  • Computer vision technology usually includes image processing, image recognition, image semantic understanding, image retrieval, Optical Character Recognition (OCR, Optical Character Recognition), video processing, video semantic understanding, video content/behavior recognition, 3D object reconstruction, 3D technology, virtual Reality, augmented reality, simultaneous positioning and map construction and other technologies, as well as common biometric identification technologies such as face recognition and fingerprint recognition.
  • OCR Optical Character Recognition
  • video processing video semantic understanding, video content/behavior recognition
  • 3D object reconstruction 3D technology
  • virtual Reality augmented reality
  • simultaneous positioning and map construction and other technologies as well as common biometric identification technologies such as face recognition and fingerprint recognition.
  • Machine Learning is a multi-field interdisciplinary subject involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other disciplines. It specializes in how computers simulate or realize human learning behaviors to acquire new knowledge or skills, and to reorganize existing knowledge structures to continuously improve their performance.
  • Machine learning is the core of artificial intelligence and the fundamental way to make computers intelligent, and its applications are in all fields of artificial intelligence.
  • Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, teaching learning and other technologies.
  • FIG. 1 is a schematic diagram of an implementation environment of an image processing method provided by an embodiment of the present application.
  • the implementation environment includes the terminal 101 , or the implementation environment includes the terminal 101 and the image processing platform 102 .
  • the terminal 101 is connected to the image processing platform 102 through a wireless network or a wired network.
  • the terminal 101 (running a client, such as a medical client, etc.) can be used to acquire medical images, for example, through the image acquisition device of the terminal 101 to acquire images, or to transmit the acquired images to the terminal through other image acquisition devices 101.
  • the terminal 101 receives an image including an object to be recognized.
  • An image processing plug-in may be implanted in the client running in the terminal to implement the image processing method locally on the client. For example, after obtaining the request to train the image classification model, the terminal 101 invokes the image processing plug-in to implement the image processing method, that is, to train the image classification model for classifying the object to be recognized and obtaining the attribute type.
  • the image may be an image in various application scenarios.
  • the image is a medical image.
  • the object to be identified is the image content related to the lesion, and the object to be identified may be the ear Parts, shadow parts in B-ultrasound, etc.
  • the attribute type is the lesion type.
  • the object to be recognized is a road element
  • the attribute type of the object to be recognized is the attribute type of the road element.
  • the object to be recognized is the face in the face image
  • the attribute type of the object to be recognized is the age type and gender type of the face
  • the terminal 101 after the terminal 101 obtains a request for training an image classification model, it calls an image processing interface of the image processing platform 102 (which can be provided in the form of a cloud service, that is, an image processing service).
  • the classification model is trained. For example, after patients, doctors, and researchers input medical images into the medical application, the medical application calls the image processing interface of the image processing platform 102 to train the image classification model, so that the image classification model has the attribute of distinguishing medical images.
  • the image processing method provided by the embodiments of this application is not aimed at living or animal bodies, and is not for the direct purpose of obtaining disease diagnosis results or health conditions, and cannot directly obtain disease diagnosis results or health conditions , that is, attribute types are not directly used for disease diagnosis, but only as intermediate data to assist patients in disease prediction, and assist doctors and researchers in disease diagnosis, follow-up visits, and research on treatment methods.
  • the terminal 101 can have an image acquisition function and an image processing function, can process the acquired images, and execute corresponding functions according to the processing results.
  • the terminal 101 may be an otoscope device.
  • the terminal 101 may be a portable otoscope device.
  • the image processing platform 102 can also have an image acquisition function and an image processing function, can process the acquired images, and execute corresponding functions according to the processing results.
  • the terminal 101 can complete the work independently, and can also provide data services for it through the image processing platform 102 . This embodiment of the present application does not limit this.
  • the terminal sends the collected and labeled medical images to the image processing platform 102, and the image processing platform 102 executes the image processing method according to the received first and second images.
  • the image processing platform 102 includes at least one of a server, multiple servers, a cloud computing platform and a virtualization center.
  • the image processing platform 102 is used to provide background services for applications supporting image processing.
  • the image processing platform 102 undertakes the main processing work, and the terminal 101 undertakes the secondary processing work; alternatively, the image processing platform 102 undertakes the secondary processing work, and the terminal 101 undertakes the main processing work; or, the image processing platform 102 or the terminal 101 can respectively independently undertake processing.
  • a distributed computing architecture is used for collaborative computing between the image processing platform 102 and the terminal 101 .
  • the image processing platform 102 includes at least one server 1021 and a database 1022.
  • the database 1022 is used to store data.
  • the database 1022 can store sample medical images or trained image classification models. , which provides data services for at least one server 1021 .
  • the server 1021 in the image processing platform 102 may be a node device in the blockchain system, and the server 1021 may store sample medical images or image classification models based on image training in the blockchain system on the blockchain.
  • the blockchain system can provide image processing services, and when the electronic device needs to perform image classification, an image classification request can be sent to the blockchain system, and the node device of the blockchain system can respond to the The image classification request, to classify the image.
  • the electronic device used for image processing may be various types of terminals or servers, wherein the server may be an independent physical server, or a server cluster or a distributed system composed of multiple physical servers, or It can be a cloud that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms.
  • Server; terminal can be smart phone, tablet computer, notebook computer, desktop computer, smart speaker, smart watch, audio playback device, video playback device, medical device, vehicle, etc., but not limited to this, terminal installation and operation are supported
  • An image processing application for example, the application can be a system application, an image processing application, or the like.
  • the terminal and the server may be directly or indirectly connected through wired or wireless communication, which is not limited in this application.
  • a server can be a server cluster deployed in the cloud to open artificial intelligence cloud services (AiaaS, AI as a Service) to users.
  • AIaaS artificial intelligence cloud services
  • the platform will split several types of common AI services and provide independent or Packaged services, this service model is similar to an AI theme mall, all users can access one or more artificial intelligence services provided by the AIaaS platform through application programming interfaces.
  • one of the artificial intelligence cloud services may be an image processing service, that is, a server in the cloud encapsulates the image processing program provided by the embodiment of the present application.
  • the user invokes the image processing service in the cloud service through a terminal (running a client, such as a map positioning client, a medical client, etc.), so that the server deployed in the cloud invokes the encapsulated image processing program.
  • the image processing method is not aimed at living or animal bodies, and is not for the direct purpose of obtaining disease diagnosis results or health conditions. It cannot directly obtain disease diagnosis results or health conditions based on the event development information of the disease type. Doctors and researchers conduct research on the diagnosis, follow-up and treatment of diseases.
  • the number of the above-mentioned terminals 101 and servers 1021 can be more or less.
  • the above-mentioned terminal 101 and server 1021 can be only one, or the above-mentioned terminal 101 and server 1021 can be dozens or hundreds, or more.
  • the embodiment of this application does not limit the number of terminals or servers and device types.
  • FIG. 2 is a flowchart of an image processing method provided by an embodiment of the present application.
  • the method is applied to an electronic device, and the electronic device is a terminal or a server.
  • the method includes the following steps.
  • the terminal performs classification processing on at least two first images and at least two second images by using an image classification model to obtain predicted classification results of at least two first images and at least two second images, and the first images carry The corresponding classification label, the second image carries a corresponding pseudo classification label, the first image and the second image are images including the object to be recognized, and the predicted classification result, the classification label and the pseudo classification label are used to indicate the attribute type of the object to be recognized. .
  • the image can be an image in various application scenarios, for example, the image is a medical image, a traffic image, etc.
  • the object to be recognized is the image content related to the lesion, and the object to be recognized can be the part of the ear, B.
  • the shaded part in the super, etc. will be explained in detail by taking the image as a medical image as an example.
  • the difference between the first image and the second image is that the labels of the two are different.
  • the first image carries a classification label, and the classification label is used to indicate the image Correct classification result or true classification result.
  • Classification labels can be obtained by manual annotation. For example, the number of classification categories may be two or more.
  • each classification label corresponds to a classification category, and the correct classification result of the image carrying the classification label can be obtained through the classification label, that is, it refers to the classification category that the image should belong to after being classified.
  • the second image carries a pseudo-classification label. Compared with the above-mentioned classification label, the pseudo-classification label is not obtained by manual labeling, but is assigned a label similar to the classification label by processing the image.
  • the pseudo-classification labels may be generated during the training process of the image classification model, and the model parameters of the image classification model are continuously optimized during the training process. Therefore, the pseudo-classification labels are also continuously optimized during the training process.
  • the first image and the second image are sample images, which may also be referred to as sample medical images.
  • the predicted classification result is a prediction result obtained by the image classification model, and the classification label and the pseudo-classification label are used to indicate the true value, indicating the correct or real result.
  • These sample images are processed by the image classification model, and based on the predicted classification results obtained by the image classification model and the classification labels or pseudo-classification labels carried by the image classification model, it is analyzed whether the predicted classification results obtained by the image classification model are accurate.
  • the model parameters of the image classification model are optimized to improve the processing performance of the image classification model.
  • predicted classification results, classification labels, and pseudo-classification labels are used to indicate the type of attributes in the image.
  • the attribute type can be the type of some attributes in the image.
  • the attribute can be the lesion in the image
  • the medical image can be an ear image, which can also be called an otoscope image.
  • the above predicted classification results, classification labels and pseudo-classification labels can be Used to indicate the type of tympanic membrane in the ear image, the type can be normal or not, and if it is normal, the normal type can also include complete health, healthy or cured recovery. If it is abnormal, the abnormal type can also include tympanitis, tympanic sclerosis, etc.
  • the attributes may also be other attributes in the image, such as image sharpness, image collection distance, color tone, image style, and the like. This is only an exemplary description, and those skilled in the art can set the above types according to requirements, which are not limited in this embodiment of the present application.
  • the terminal obtains a first loss value based on the predicted classification results of the at least two first images and the corresponding classification labels, and the predicted classification results of the at least two second images and the corresponding pseudo classification labels, and the first loss value is used for Indicates the accuracy of the predicted classification results.
  • the terminal After the terminal determines the predicted classification result, it can evaluate whether the predicted classification result is accurate according to the label carried by the image, so as to measure the processing performance of the image classification model.
  • the first loss value is used as a measure of the processing performance.
  • the accuracy of the predicted results can be determined. Understandably, the more accurate the predicted classification result, the better the processing performance (eg accuracy) of the image classification model, and the purpose of model training is to improve the processing performance of the image classification model.
  • the terminal obtains a second loss value based on the predicted classification results of the at least two second images and the corresponding pseudo-classification labels, where the second loss value is used to indicate the accuracy of the pseudo-classification labels.
  • the accuracy of the predicted classification result is determined by predicting the classification result and the classification label or pseudo-classification label to improve the classification accuracy of the model, but also the accuracy of the pseudo-classification label can be estimated to further determine the image
  • the classification accuracy of the classification model so that in the next stage of training, the pseudo-classification labels will be updated based on the predicted classification results.
  • the pseudo-classification labels are more and more accurate, and the image classification model classification is also more and more accurate.
  • the terminal updates the model parameters of the image classification model based on the first loss value and the second loss value, updates the corresponding pseudo-classification labels based on the predicted classification results of at least two second images, and then continues to perform classification processing, and The acquisition of the first loss value and the second loss value is processed until the target image classification model is obtained when the target condition is met until when the target condition is met.
  • the following processing is performed: updating the corresponding pseudo-classification labels based on the predicted classification results of the at least two second images; Perform classification processing with at least two second images to obtain updated predicted classification results of at least two first images and at least two second images; based on the updated predicted classification results of at least two first images and at least two second images , the classification labels corresponding to at least two first images, and the updated pseudo-classification labels corresponding to at least two second images, update the model parameters of the updated image classification model again; when the target conditions are met, update the The image classification model obtained by the model parameters is determined as the target image classification model.
  • the above step 201 is an iterative process. After the terminal obtains the required loss value, it can update the model parameters based on the loss value, and then can update the pseudo-classification label used in the next stage. Through multiple iterations, the classification accuracy of the image classification model is gradually improved, and a qualified target image classification model can be obtained.
  • the target image classification model is a trained model.
  • a pseudo-classification label is used to label the second image when training an image classification model, and a corresponding loss value is designed based on the pseudo-classification label to indicate whether it is accurate.
  • the model parameters are updated, and during the update process
  • the pseudo-classification labels will also be updated accordingly, so that only some images need to have classification labels, and pseudo-classification labels are generated for other images during the model training process, that is, all images do not need to have classification labels, which can greatly reduce manual labeling. The labor cost brought about, and the training efficiency is improved.
  • the pseudo-classification labels are continuously updated during the model training process, and finally pseudo-classification labels with similar accuracy to the classification labels can be determined, so as to increase the number of training samples, thereby improving the accuracy of the image classification model.
  • FIG. 3 is a flowchart of an image processing method provided by an embodiment of the present application. Referring to FIG. 3 , the method includes the following steps.
  • the terminal acquires at least two first images and at least two second images, the first images carry corresponding classification labels, and the first images and the second images are medical images.
  • first image and second image are sample images used in the training of the image classification model.
  • the two images are classified according to the presence or absence of classification labels.
  • the first image with the classification label is called the first image, and the second image without the classification label.
  • Classification labels are also true classification labels.
  • the terminal acquires initial prediction classification results of at least two first images and at least two second images based on the image classification model.
  • the terminal can call the image classification model, the image classification model can be an initial model, and the model parameters of the image classification model need to be optimized.
  • the terminal When the terminal obtains the predicted classification result based on the image classification model, it may first perform feature extraction on the image, and then classify the image based on the image features. That is, in step 302, the electronic device may input the at least two first images and the at least two second images into the image classification model, and the image classification model performs the at least two first images and the at least two second images. Feature extraction, obtaining image features of at least two first images and at least two second images, and classifying the first images and the second images based on the image features of the at least two first images and the at least two second images, respectively , to obtain the predicted classification results of the first image and the second image.
  • the first image carries the corresponding classification label, while the second image does not carry the corresponding classification label. Therefore, it is also necessary to process the pseudo classification label of the second image according to the predicted classification result at this stage. In this way, in the next stage, the second image can carry pseudo-classification labels and can participate in the classification loss.
  • the training system framework of the image classification model is shown in Figure 4.
  • the system framework can be mainly composed of three modules: a feature extraction module (F, Feature Extractor) 401, a Gaussian mixture module (G, Gaussian Mixture) 402 and a classifier module (C , Classifier) 403.
  • F feature extraction module
  • G Gaussian Mixture
  • C classifier module
  • the feature extraction module 401 is for extracting a learnable deep feature representation, and the prediction result corresponding to the input data can be learned by inputting the representation into the classifier. That is, the feature extraction module 401 is used to extract the image features of the image.
  • the at least two first images and the classification labels corresponding to the first images may be called source domain datasets, and (X S , Y S ) represent the source domain datasets (i.e. training set) images and ground-truth class labels.
  • the at least two second images and the pseudo-classification labels corresponding to the second images may also be called target domain data sets, and (X T , Y T ) represent the images and pseudo-classification labels of the target domain data set (ie, the validation set).
  • the feature extraction module 401 can be a feature extraction network (F)) to extract the feature representation (that is, the image feature) from the source domain and target domain data. ), and then input the image features (F S , F T ) of the source domain and the target domain into the classifier module 403 (the classifier module 403 can be the classifier (C)), and the classifier module 403 outputs the source domain and target The predicted results of the domain (P S , P T ), that is, the predicted classification results.
  • F feature extraction network
  • the prediction result of the target domain is input to the Gaussian mixture module 402 (the Gaussian mixture module 402 can be a Gaussian mixture model) to calculate the weights (Weights) of the importance of the prediction results and the estimated pseudo-classification label (Pseudo-label), Gaussian mixture Module 402 is used to calculate the above weights.
  • the weights will participate in the calculation of the Robust Pseudo-label Loss function together with the pseudo-classification labels, so as to learn more accurate target set test results. That is, the process of obtaining the second loss value.
  • the Gaussian mixture module 402 refer to the following step 307 for details. Only the process of feature extraction and classification of the image classification model is described here.
  • the feature extraction module 401 and the classifier module 403 can be any deep learning network structure.
  • the feature extraction module and the classifier module can use a residual network (ResNet50), or other network structures, such as high-efficiency networks. (EfficientNet), mobile network (MobileNet), dense network (DenseNet) and other arbitrary deep learning networks.
  • ResNet50 residual network
  • EfficientNet high-efficiency networks.
  • MobileNet mobile network
  • DenseNet dense network
  • the above-mentioned feature extraction module 401 may use the network before the fully connected layer of ResNet50
  • the classifier module 403 may use the last fully connected layer.
  • the pseudo-classification label is obtained by calculation based on the prediction result of the target domain output by the fully connected layer.
  • the terminal obtains an initial first loss value based on the classification label carried by the first image and the predicted classification result of the first image.
  • the predicted classification result is a predicted value. Since the first image also carries the real label, that is, the real value, the terminal can pass the predicted value. The error between the true value and the true value is used as the basis for model training.
  • the first loss value may be used to indicate the accuracy of the predicted classification result of the first image.
  • the first loss value is used to determine the accuracy of the classification result of the image classification model, which may be called classification loss.
  • the classification process of medical images can also be understood as a segmentation process. Therefore, the classification process can also be called a segmentation process, and a classification loss can also be called a segmentation loss.
  • the first loss value may be a value of a cross-entropy loss function.
  • the cross entropy loss function Through the cross entropy loss function, the error between the classification label and the predicted classification result can be calculated to measure the classification ability of the image classification model. Understandably, if the first loss value is a cross entropy loss value, the larger the first loss value, the larger the error between the predicted classification result and the classification label, and the worse the classification ability of the image classification model. The smaller the first loss value, the smaller the error between the predicted classification result and the classification label, and the better the classification ability of the image classification model.
  • the first loss value may also be obtained by using other loss functions, for example, L1 or L2, etc., which is not limited in this embodiment of the present application.
  • the terminal updates the model parameters of the image classification model based on the first loss value.
  • the terminal may update the model parameters based on the first loss value.
  • the model parameter update process can be implemented by any model update algorithm.
  • the update process may be implemented by a Stochastic Gradient Descent (SGD, Stochastic Gradient Descent) algorithm. This is not limited in this embodiment of the present application.
  • the terminal acquires pseudo-classification labels corresponding to the at least two second images based on the predicted classification results of the at least two second images.
  • the second image does not originally carry the corresponding classification label, so it needs to be calculated based on the predicted classification result, so that in the next stage, the second image can carry the corresponding pseudo-classification label, and then participate in the calculation of the classification loss.
  • the process can also calculate a pseudo-classification label with good accuracy for the second image.
  • the terminal when acquiring the pseudo-classification labels, may cluster the images, and determine the pseudo-classification labels based on the clustering results. Specifically, the terminal may perform clustering on at least two first images and at least two second images, and obtain a clustering result, wherein the clustering result includes at least two cluster centers and an image corresponding to each cluster center. The distance between each second image and the cluster center to which it belongs, and the weight of each second image is obtained.
  • the pseudo-classification label (Pseudo-label) can be implemented by the following formula:
  • argmax k is a function, which is a function of finding parameters (sets) for a function, which refers to x k when C(F(X T )) achieves the maximum value.
  • [ ] k represents the use of cosine similarity (ie ) to calculate the kth cluster center.
  • C( ) is the classification process, and F(X T ) represents the feature extraction of the second image X T .
  • the terminal obtains the predicted classification results of at least two first images and at least two second images based on the image classification model, the first images carry corresponding classification labels, and the second images carry corresponding pseudo classification labels.
  • Step 306 is the same as the above-mentioned step 302, and details are not repeated here.
  • the terminal obtains a first loss value based on the predicted classification results of the at least two first images and the corresponding classification labels, and the at least two second images and the corresponding pseudo classification labels, and the first loss value is used to indicate the prediction of the images The accuracy of the classification results.
  • both the first image and the second image have classification labels or pseudo-classification labels
  • both images can participate in the calculation of classification loss or segmentation loss, so the terminal can predict the classification results based on the two images , classification labels or pseudo-classification labels to calculate the first loss value.
  • Step 307 is the same as the above step 303, the difference is that in step 307, the second image corresponds to a pseudo-classification label, so it can also participate in the calculation of the classification loss or segmentation loss. Therefore, when the terminal obtains the first loss value , also taking into account the error between the predicted classification result of the second image and the pseudo-classification label.
  • the embodiments of the present application will not be described in detail here.
  • the terminal obtains a second loss value based on the predicted classification results of the at least two second images and the corresponding pseudo-classification labels, where the second loss value is used to indicate the accuracy of the pseudo-classification labels.
  • the terminal can also evaluate whether the pseudo-classification label is accurate, whether it is a real classification label, or whether it is relatively close to the real classification label. Therefore, the terminal can also obtain the second loss value according to the predicted classification result of the second image and the pseudo-classification label, so as to judge whether the pseudo-classification label is accurate.
  • the terminal may obtain the predicted classification results of the at least two second images and the errors between the corresponding pseudo-classification labels, and then perform a calculation on the errors corresponding to the at least two second images according to the weights of the at least two second images. Weighted to get the second loss value.
  • the weight value of the second image is adaptively adjusted according to whether the pseudo-classification label is accurate, thereby ensuring the accuracy of the pseudo-classification label.
  • the weights when obtaining the weights, it can be achieved by clustering. For images of the same category, the distribution of image features will be relatively similar. The distance can analyze the attribute type in the second image, and naturally it can also reflect the possibility that the pseudo-classification label is the correct classification label. For accurate classification, a higher weight can be set; for poor classification, a lower weight can be set. That is, the above weight may correspond to whether the pseudo-classification label of the second image is a correct classification label, or the weight may correspond to the probability of whether the pseudo-classification label of the second image is a correct classification label.
  • the terminal may cluster at least two first images and at least two second images to obtain at least two cluster centers, and obtain each cluster center according to the distance between each second image and the cluster center to which it belongs.
  • obtaining the weights of each second image based on the distance between each second image and the cluster center to which it belongs can effectively improve the accuracy of the weights, thereby improving the accuracy of the subsequent second loss values.
  • the above-mentioned clustering is used to cluster images with similar image features together, so as to determine the distribution of different types of images from the image feature level, so as to be modeled by the feature distance of the category center in the feature space, to obtain accurate weights.
  • the terminal can obtain the probability corresponding to each image according to the distance between each second image and the cluster center to which it belongs, where the probability is the probability that the pseudo-classification label is the correct classification label, and then based on the corresponding probability of each second image Probability, obtain the weight of each image, and use the probability to describe the weight, which can effectively improve the accuracy and effectiveness of the second loss value, thereby improving the efficiency of model training.
  • the terminal may obtain the probability as the weight of the second image in response to the probability corresponding to any second image being greater than or equal to the probability threshold.
  • the terminal may acquire a zero value as the weight of the second image in response to the probability corresponding to any second image being smaller than the probability threshold.
  • the probability threshold may be set by relevant technical personnel according to requirements.
  • the probability threshold may be 0.5, which is not limited in this embodiment of the present application.
  • step 308 may be implemented by a Gaussian mixture model. Specifically, based on the Gaussian mixture model, the at least two second images, the predicted classification results of the at least two second images, and the corresponding pseudo-classification labels are processed, Get the second loss value. That is, the electronic device can input the predicted classification result of the second image and the pseudo-classification label into the Gaussian mixture model, and the Gaussian mixture model processes the input data and outputs the second loss value.
  • the Gaussian mixture model is to use the Gaussian probability density function (normal distribution curve) to accurately quantify things. It is a model that decomposes things into several models based on the Gaussian probability density function (normal distribution curve).
  • the electronic device may further update the model parameters of the Gaussian mixture model based on the second loss value.
  • the output of the classification network and represent the predicted classification results of the source domain and target domain data, respectively, Pseudo-categorical labels representing the target domain data, where K represents the number of categories to be classified, N S , N T represent the number of source and target domain images, respectively.
  • the weight can be realized by the following formula 2:
  • the terminal updates the model parameters of the image classification model based on the first loss value and the second loss value, and continues to perform classification and loss values after updating the corresponding pseudo-classification labels based on the predicted classification results of at least two second images.
  • the obtaining step is until the target image classification model is obtained when the target condition is met.
  • the terminal integrates the two loss values and updates the model parameters.
  • the target loss value can be obtained based on the first loss value and the second loss value, and the model parameters can be updated based on the target loss value.
  • the terminal may weight the first loss value and the second loss value to obtain the target loss value, and weight the first loss value and the second loss value to obtain the target loss value, so that when training is performed based on the target loss value Can improve training efficiency.
  • the terminal can use the sum of the first loss value and the second loss value as the target loss value, for example, use is expressed, as shown in Equation 3 below.
  • (X S , Y S ) represents the source domain data
  • X S refers to the first image
  • Y S refers to the classification label carried by the first image
  • X T refers to the second image
  • w is the weight of the false classification label as the correct classification label
  • MAE( ⁇ ) represents the mean absolute value error (Mean Absolute Error).
  • F refers to the feature extraction module and C refers to the classifier module.
  • the output of the classification network and represent the predicted classification results of the source domain and target domain data, respectively, where K represents the number of categories to be classified, and N S and N T represent the number of the first image and the second image, respectively.
  • the first loss value only includes the cross-entropy loss function calculated from the first image, but does not take into account the cross-entropy loss function calculated from the second image.
  • the terminal may determine whether to include the cross-entropy loss function calculated from the second image into the first loss value according to the settings. In the initial iteration, when the second image has not obtained the pseudo-classification label, the cross-entropy loss function cannot be calculated, so the first loss value may not be counted. If the second image already carries the pseudo-classification label, it can also be The calculated cross-entropy loss function is included in the first loss value.
  • the terminal may further acquire the prediction types of the at least two first images and the at least two second images according to the at least two first images and the at least two second images, and the corresponding prediction classification results, and predict the The type is used to indicate that the image is the first image or the second image, and a third loss value is obtained according to the prediction types of the at least two first images and the at least two second images, and the third loss value is used to indicate the accuracy of the prediction type .
  • the terminal may update the model parameters of the image classification model based on the first loss value, the second loss value and the third loss value.
  • the terminal may, based on the discriminant network, perform classification on the types of the at least two first images and the at least two second images according to the at least two first images and the at least two second images and the corresponding predicted classification results Discriminate, obtain the prediction types of at least two first images and at least two second images, and then update the network parameters of the discriminant network according to the third loss value, so as to train the segmentation network and the discriminant network alternately.
  • the model training efficiency is effectively improved.
  • a staged training method is adopted. Pseudo-classification labels are generated using the training results of the previous stage and applied during the current stage of segmentation network training. In each stage, the segmentation network and the discriminative network are trained together in an alternately updated manner.
  • the image data is first input into the segmentation network, and the segmentation loss is calculated using the true classification labels of the source domain data and the pseudo-classification labels of the target domain data That is, the above-mentioned classification loss or first loss value.
  • the parameters of the segmentation network are then updated by minimizing the segmentation loss.
  • the source domain and target domain data can also be discriminated through the discriminant network, which makes the data classification of the two domains more similar.
  • the SGD algorithm may be used to optimize and train the segmentation network (image classification model), and the Adam algorithm may be used to optimize and train the discriminant network.
  • the target condition can be the convergence of the loss value, or the number of iterations to reach the target number, and the target condition can also be related to the learning rate.
  • the initial learning rates of the segmentation network and the discriminant network are 2.5x10 -4 and 1x10 -4 .
  • the learning rate can affect how fast the model converges.
  • the image classification model After the image classification model is trained on the terminal, it can be applied to the image classification scene.
  • the image classification scene can be segmenting otoscope data.
  • the terminal may, in response to the image processing instruction, acquire a third image, where the third image is a medical image, input the third image into the target image classification model, and perform feature extraction on the third image by the target image classification model, based on the extracted image features.
  • the third image is classified to obtain the attribute type of the object to be recognized in the third image.
  • the attribute type can be determined according to the image classification scene. For example, in the otoscope data segmentation scene, the object to be identified is an ear image, and the attribute type can be the type of the eardrum.
  • the attribute type may refer to whether the human tissue is a brain tumor or a brain cancer.
  • the attribute type in the scene of classifying the image collection distance, may be far distance or near distance.
  • the process can also be regarded as the testing process of the model, and the acquired images are accurately classified through the trained model.
  • the test set it can be input into the trained image classification model, the image classification model includes a feature extraction module 501 and a classifier module 502, and finally the classifier module 502 outputs the test set The prediction result, that is, the Test Prediction.
  • a pseudo-classification label is used to label the second image when training an image classification model, and a corresponding loss value is designed based on the pseudo-classification label to indicate whether it is accurate.
  • the model parameters are updated, and during the update process
  • the pseudo-classification labels will also be updated accordingly, so that only some images need to have classification labels, and pseudo-classification labels are generated for other images during the model training process, that is, all images do not need to have classification labels, which can greatly reduce manual labeling. The labor cost brought about, and the training efficiency is improved.
  • the pseudo-classification labels are continuously updated during the model training process, and finally pseudo-classification labels with similar accuracy to the classification labels can be determined, so as to increase the number of training samples, thereby improving the accuracy of the image classification model.
  • 9804 pieces of otoscope data were used, of which 4175, 2281 and 3348 pieces of cholesteatoma, chronic suppurative otitis media and normal data were used, respectively.
  • the unit (that is, the data of the same patient will not be divided) is divided into training set, validation set and test set, while the training set contains 8710 images, and the validation set and test set contain 548 and 546 images respectively, and the image sizes are resized as 256 ⁇ 256; during testing, a validation set containing 546 images is used to test the results.
  • ResNet50 As the basic network, the algorithm provided by the embodiment of the present application is added to the ResNet50 for method verification. A large number of experiments are carried out with different training data sizes (ie, 10% to 100% of the data are used respectively), and the experimental results are shown in Table 1 and Table 2. It can be seen that the algorithms provided by the embodiments of the present application can significantly improve the performance of classifying otitis media lesions based on medical images on training data of different sample sizes.
  • FIG. 6 is a schematic structural diagram of an image processing apparatus provided by an embodiment of the present application.
  • the apparatus includes: 601, configured to use an image classification model to classify at least two first images and at least two second images Perform classification processing to obtain predicted classification results of at least two first images and at least two second images, where the first images carry corresponding classification labels, the second images carry corresponding pseudo-classification labels, and the The first image and the second image are images including the object to be recognized, and the predicted classification result, the classification label and the pseudo classification label are used to indicate the attribute type of the object to be recognized; the acquiring module 601, further is configured to obtain a first loss value based on the predicted classification results of the at least two first images and the corresponding classification labels, and the predicted classification results of the at least two second images and the corresponding pseudo-classified labels, and the first loss value is obtained.
  • a loss value is used to indicate the accuracy of the predicted classification result; the obtaining module 601 is further configured to obtain a second loss value based on the predicted classification results of the at least two second images and the corresponding pseudo classification labels, the second loss value is The loss value is used to indicate the accuracy of the pseudo classification label; the updating module 602 is configured to update the model parameters of the image classification model based on the first loss value and the second loss value, based on the The predicted classification results of the at least two second images update the corresponding pseudo-classification labels and continue to perform the classification process and the acquisition process of the first loss value and the second loss value, until the target image is obtained when the target condition is met classification model.
  • the obtaining module 601 is configured to: obtain the predicted classification results of the at least two second images and the errors between the corresponding pseudo-classification labels; The errors corresponding to the two images are weighted to obtain a second loss value.
  • the obtaining module 601 is configured to: perform clustering on at least two first images and at least two second images to obtain a clustering result, wherein the clustering result includes at least two cluster centers and corresponding The image of each cluster center; according to the distance between each second image and the cluster center to which it belongs, the weight of each second image is obtained.
  • the obtaining module 601 is configured to: obtain a probability corresponding to at least two second images according to the distance between each second image and the cluster center to which it belongs, where the probability is the probability that the false classification label is the correct classification label ; Obtain the weights of the at least two second images based on the corresponding probabilities of the at least two second images.
  • the obtaining module 601 is configured to: in response to the probability corresponding to any second image being greater than or equal to the probability threshold, obtain the probability as the weight of the second image; in response to the probability corresponding to any second image being less than the probability Threshold, obtain zero value as the weight of the second image.
  • the updating module 602 is configured to obtain a pseudo-classification label corresponding to the predicted classification result of each second image according to the predicted classification result and the clustering result of the at least two second images.
  • the obtaining module 601 is configured to: process the at least two second images, the predicted classification results of the at least two second images, and the corresponding pseudo-classification labels based on a Gaussian mixture model to obtain the second loss value;
  • the update module 602 is further configured to update the model parameters of the Gaussian mixture model based on the second loss value.
  • the obtaining module 601 is further configured to obtain the predictions of the at least two first images and the at least two second images according to the at least two first images and the at least two second images and the corresponding prediction classification results Type, the prediction type is used to indicate that the image is the first image or the second image;
  • the obtaining module 601 is further configured to obtain a third loss value according to the prediction types of the at least two first images and the at least two second images, where the third loss value is used to indicate the accuracy of the prediction types;
  • the update module 602 is configured to update the model parameters of the image classification model based on the first loss value, the second loss value, and the third loss value.
  • the acquiring module 601 is configured to perform discrimination processing on at least two first images and at least two second images and corresponding predicted classification results based on a discriminant network to obtain at least two first images and at least two The prediction type of the second image; the updating module 602 is further configured to update the network parameters of the discriminating network according to the third loss value.
  • the obtaining module 601 is further configured to obtain a third image in response to the image processing instruction, and the third image is a medical image; the apparatus further includes: a classification module, configured to input the third image into the target image classification model, Feature extraction is performed on the third image by the target image classification model, and the third image is classified based on the extracted image features to obtain the type of the lesion in the third image.
  • a classification module configured to input the third image into the target image classification model, Feature extraction is performed on the third image by the target image classification model, and the third image is classified based on the extracted image features to obtain the type of the lesion in the third image.
  • a pseudo-classification label is used to label the second image when an image classification model is trained, and a corresponding loss value is designed based on the pseudo-classification label to indicate whether it is accurate, and the model parameters are updated based on this , and the pseudo-classification labels will also be updated during the update process, so that only some images need to have corresponding classification labels, and pseudo-classification labels are generated for other images during the model training process, that is, it is not necessary for all images to have corresponding classification labels. It can greatly reduce the labor cost caused by manual annotation and improve the training efficiency.
  • the pseudo-classification label is continuously updated during the model training process, and finally a pseudo-classification label with the same accuracy as the classification label can be determined, so as to increase the number of training samples, thereby improving the accuracy of the image classification model.
  • the image processing apparatus when the image processing apparatus provided in the above embodiments performs image processing based on artificial intelligence, only the division of the above functional modules is used as an example for illustration. In practical applications, the above functions can be assigned to different functions according to needs The module is completed, that is, the internal structure of the image processing apparatus is divided into different functional modules, so as to complete all or part of the functions described above.
  • the image processing apparatus and the image processing method embodiments provided by the above embodiments belong to the same concept, and the specific implementation process thereof is detailed in the method embodiments, which will not be repeated here.
  • FIG. 7 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the electronic device 700 may have relatively large differences due to different configurations or performances, and may include one or more processors (CPU, Central Processing Units) 701 and one or more memories 702, wherein at least one computer program is stored in the memory 702, and the at least one computer program is loaded and executed by the processor 701 to implement the image processing methods provided by the above method embodiments.
  • the electronic device can also include other components for realizing the functions of the device.
  • the electronic device can also have components such as a wired or wireless network interface, an input and output interface, etc., so as to perform input and output. This embodiment of the present application will not be repeated here.
  • FIG. 8 is a structural block diagram of a terminal provided by an embodiment of the present application.
  • the terminal 800 may be a portable mobile terminal, such as a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, an audio playback device, a video playback device, a medical device, and the like.
  • Terminal 800 may also be called user equipment, portable terminal, laptop terminal, desktop terminal, and the like by other names.
  • the terminal 800 includes: a processor 801 and a memory 802 .
  • the processor 801 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like.
  • the processor 801 may adopt at least one hardware form among a digital signal processor (DSP, Digital Signal Processing), a field programmable gate array (FPGA, Field-Programmable Gate Array), and a programmable logic array (PLA, Programmable Logic Array). to fulfill.
  • DSP Digital Signal Processing
  • FPGA Field-Programmable Gate Array
  • PDA Programmable Logic Array
  • the processor 801 may also include a main processor and a co-processor.
  • the main processor is a processor used to process data in the wake-up state, also called a central processing unit; the co-processor is used to process data in a standby state.
  • the processor 801 may be integrated with a graphics processor (GPU, Graphics Processing Unit), and the GPU is used for rendering and drawing the content that needs to be displayed on the display screen.
  • the processor 801 may also include an AI processor for processing computing operations related to machine learning.
  • Memory 802 may include one or more computer-readable storage media, which may be non-transitory. Memory 802 may also include high-speed random access memory, as well as non-volatile memory, such as one or more disk storage devices, flash storage devices. In some embodiments, a non-transitory computer-readable storage medium in the memory 802 is used to store at least one instruction, and the at least one instruction is used to be executed by the processor 801 to implement the image processing provided by the method embodiments in this application. method.
  • the terminal 800 may optionally further include: a peripheral device interface 803 and at least one peripheral device.
  • the processor 801, the memory 802 and the peripheral device interface 803 may be connected by a bus or a signal line.
  • Each peripheral device can be connected to the peripheral device interface 803 through a bus, a signal line or a circuit board.
  • the peripheral device includes: at least one of a radio frequency circuit 804 , a display screen 805 , a camera assembly 806 , an audio circuit 807 , a positioning assembly 808 and a power source 809 .
  • the peripheral device interface 803 may be used to connect at least one peripheral device related to I/O (Input/Output) to the processor 801 and the memory 802 .
  • processor 801, memory 802, and peripherals interface 803 are integrated on the same chip or circuit board; in some other embodiments, any one of processor 801, memory 802, and peripherals interface 803 or The two can be implemented on a separate chip or circuit board, which is not limited in this embodiment.
  • the radio frequency circuit 804 is used for receiving and transmitting radio frequency (RF, Radio Frequency) signals, also called electromagnetic signals.
  • the radio frequency circuit 804 communicates with the communication network and other communication devices via electromagnetic signals.
  • the radio frequency circuit 804 converts electrical signals into electromagnetic signals for transmission, or converts received electromagnetic signals into electrical signals.
  • radio frequency circuitry 804 includes: an antenna system, an RF transceiver, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and the like.
  • the radio frequency circuit 804 may communicate with other terminals through at least one wireless communication protocol.
  • the wireless communication protocol includes but is not limited to: World Wide Web, Metropolitan Area Network, Intranet, various generations of mobile communication networks (2G, 3G, 4G and 5G), wireless local area network and/or Wireless Fidelity (WiFi, Wireless Fidelity) network.
  • the radio frequency circuit 804 may further include a circuit related to Near Field Communication (NFC, Near Field Communication), which is not limited in this application.
  • NFC Near Field Communication
  • the display screen 805 is used to display a user interface (UI, User Interface).
  • the UI can include graphics, text, icons, video, and any combination thereof.
  • the display screen 805 also has the ability to acquire touch signals on or above the surface of the display screen 805 .
  • the touch signal can be input to the processor 801 as a control signal for processing.
  • the display screen 805 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards.
  • the display screen 805 there may be one display screen 805, which is arranged on the front panel of the terminal 800; in other embodiments, there may be at least two display screens 805, which are respectively arranged on different surfaces of the terminal 800 or in a folded design; In other embodiments, the display screen 805 may be a flexible display screen, which is disposed on a curved surface or a folding surface of the terminal 800 . Even, the display screen 805 can also be set as a non-rectangular irregular figure, that is, a special-shaped screen.
  • the display screen 805 can be made of materials such as a liquid crystal display (LCD, Liquid Crystal Display), an organic light-emitting diode (OLED, Organic Light-Emitting Diode).
  • the camera assembly 806 is used to capture images or video.
  • camera assembly 806 includes a front-facing camera and a rear-facing camera.
  • the front camera is arranged on the front panel of the terminal, and the rear camera is arranged on the back of the terminal.
  • there are at least two rear cameras which are any one of a main camera, a depth-of-field camera, a wide-angle camera, and a telephoto camera, so as to realize the fusion of the main camera and the depth-of-field camera to realize the background blur function, the main camera Integrate with wide-angle camera to achieve panoramic shooting and virtual reality (VR, Virtual Reality) shooting functions or other integrated shooting functions.
  • the camera assembly 806 may also include a flash.
  • the flash can be a single color temperature flash or a dual color temperature flash. Dual color temperature flash refers to the combination of warm light flash and cold light flash, which can be used for light compensation under different color temperatures.
  • Audio circuitry 807 may include a microphone and speakers.
  • the microphone is used to collect the sound waves of the user and the environment, convert the sound waves into electrical signals, and input them to the processor 801 for processing, or to the radio frequency circuit 804 to realize voice communication.
  • the microphone may also be an array microphone or an omnidirectional collection microphone.
  • the speaker is used to convert the electrical signal from the processor 801 or the radio frequency circuit 804 into sound waves.
  • the loudspeaker can be a traditional thin-film loudspeaker or a piezoelectric ceramic loudspeaker.
  • audio circuitry 807 may also include a headphone jack.
  • the positioning component 808 is used to locate the current geographic location of the terminal 800 to implement navigation or a Location Based Service (LBS, Location Based Service).
  • LBS Location Based Service
  • the positioning component 808 may be a positioning component based on the Global Positioning System (GPS, Global Positioning System) of the United States, the Beidou system of China or the Galileo system of Russia.
  • GPS Global Positioning System
  • the power supply 809 is used to power various components in the terminal 800 .
  • the power source 809 may be alternating current, direct current, disposable batteries or rechargeable batteries.
  • the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. Wired rechargeable batteries are batteries that are charged through wired lines, and wireless rechargeable batteries are batteries that are charged through wireless coils.
  • the rechargeable battery can also be used to support fast charging technology.
  • terminal 800 also includes one or more sensors 810 .
  • the one or more sensors 810 include, but are not limited to, an acceleration sensor 811 , a gyro sensor 812 , a pressure sensor 813 , a fingerprint sensor 814 , an optical sensor 815 , and a proximity sensor 816 .
  • the acceleration sensor 811 can detect the magnitude of acceleration on the three coordinate axes of the coordinate system established by the terminal 800 .
  • the acceleration sensor 811 can be used to detect the components of the gravitational acceleration on the three coordinate axes.
  • the processor 801 can control the display screen 805 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 811 .
  • the acceleration sensor 811 can also be used for game or user movement data collection.
  • the gyroscope sensor 812 can detect the body direction and rotation angle of the terminal 800 , and the gyroscope sensor 812 can cooperate with the acceleration sensor 811 to collect 3D actions of the user on the terminal 800 .
  • the processor 801 can implement the following functions according to the data collected by the gyro sensor 812: motion sensing (such as changing the UI according to the user's tilt operation), image stabilization during shooting, game control, and inertial navigation.
  • the pressure sensor 813 may be disposed on the side frame of the terminal 800 and/or the lower layer of the display screen 805 .
  • the processor 801 can perform left and right hand identification or shortcut operations according to the holding signal collected by the pressure sensor 813.
  • the processor 801 controls the operability controls on the UI interface according to the user's pressure operation on the display screen 805 .
  • the operability controls include at least one of button controls, scroll bar controls, icon controls, and menu controls.
  • the fingerprint sensor 814 is used to collect the user's fingerprint, and the processor 801 identifies the user's identity according to the fingerprint collected by the fingerprint sensor 814 , or the fingerprint sensor 814 identifies the user's identity according to the collected fingerprint. When the user's identity is identified as a trusted identity, the processor 801 authorizes the user to perform related sensitive operations, including unlocking the screen, viewing encrypted information, downloading software, making payments, and changing settings.
  • the fingerprint sensor 814 may be disposed on the front, back or side of the terminal 800 . When the terminal 800 is provided with physical buttons or a manufacturer's logo, the fingerprint sensor 814 may be integrated with the physical buttons or the manufacturer's logo.
  • Optical sensor 815 is used to collect ambient light intensity.
  • the processor 801 may control the display brightness of the display screen 805 according to the ambient light intensity collected by the optical sensor 815 . Specifically, when the ambient light intensity is high, the display brightness of the display screen 805 is increased; when the ambient light intensity is low, the display brightness of the display screen 805 is decreased.
  • the processor 801 may also dynamically adjust the shooting parameters of the camera assembly 806 according to the ambient light intensity collected by the optical sensor 815 .
  • a proximity sensor 816 also called a distance sensor, is usually provided on the front panel of the terminal 800 .
  • the proximity sensor 816 is used to collect the distance between the user and the front of the terminal 800 .
  • the processor 801 controls the display screen 805 to switch from the bright screen state to the off screen state; when the proximity sensor 816 detects When the distance between the user and the front of the terminal 800 gradually increases, the processor 801 controls the display screen 805 to switch from the closed screen state to the bright screen state.
  • FIG. 8 does not constitute a limitation on the terminal 800, and may include more or less components than the one shown, or combine some components, or adopt different component arrangements.
  • FIG. 9 is a schematic structural diagram of a server provided by an embodiment of the present application.
  • the server 900 may vary greatly due to different configurations or performance, and may include one or more processors 901 and one or more memories 902, wherein at least one computer program is stored in the memory 902, and the at least one computer program is loaded and executed by the processor 901 to implement the image processing methods provided by the above method embodiments.
  • the server can also have components such as wired or wireless network interfaces and input and output interfaces for input and output, and the server can also include other components for implementing device functions, which are not described here.
  • a computer-readable storage medium such as a memory including at least one computer program, the at least one computer program being executable by a processor to complete the image processing method in the above-mentioned embodiment.
  • the computer-readable storage medium can be Read-Only Memory (ROM), Random Access Memory (RAM), Compact Disc Read-Only Memory (CD-ROM), magnetic tapes, floppy disks, and optical data storage devices. Wait.
  • a computer program product or computer program comprising one or more pieces of program code stored in a computer-readable storage medium .
  • One or more processors of the electronic device can read the one or more program codes from the computer-readable storage medium, and the one or more processors execute the one or more program codes, so that the electronic device can execute the above image Approach.
  • the size of the sequence numbers of the above-mentioned processes does not mean the sequence of execution, and the execution sequence of each process should be determined by its functions and internal logic, and should not be dealt with in the embodiments of the present application. implementation constitutes any limitation.
  • determining B according to A does not mean that B is only determined according to A, and B can also be determined according to A and/or other information.

Abstract

一种图像处理方法、装置、设备及存储介质,图像处理方法包括在训练图像分类模型时采用伪分类标签对第二图像进行标注,并基于该伪分类标签对应的损失值来指示其是否准确,基于此对模型参数进行更新,且更新过程中伪分类标签也会随之更新。该方法仅需要部分图像有分类标签,在模型训练过程中为其他图像生成伪分类标签,能够减少人工标注带来的人工成本,并且该伪分类标签在模型训练过程中不断更新,最终能够确定出与分类标签准确性差不多的伪分类标签,增加了训练样本的数量,进而提高了图像分类模型的准确性。

Description

图像处理方法、装置、电子设备、计算机可读存储介质及计算机程序产品
相关申请的交叉引用
本申请实施例基于申请号为202110286366.X、申请日为2021年03月17日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请实施例作为参考。
技术领域
本申请涉及人工智能技术领域,特别涉及一种图像处理方法、装置、电子设备、计算机可读存储介质及计算机程序产品。
背景技术
人工智能(AI,Artificial Intelligence)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个综合技术,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。
随着人工智能技术的发展,越来越多的领域中,通过机器学习训练模型来帮助人们完成复杂计算。在基于人工智能的图像处理技术中,可以训练出用于对医学图像进行分类的图像分类模型,从而通过图像分类模型确定医学图像的属性类型。相关技术中的图像处理方法所使用的标注数据,是基于人工标注得到的。但人工标注过程耗费大量人工成本,效率低且容易出现错误。
发明内容
本申请实施例提供了一种图像处理方法、装置、电子设备、计算机可读存储介质及计算机程序产品,能够提高图像处理效率和准确度。所述技术方案如下:
本申请实施例提供了一种图像处理方法,所述方法由电子设备执行,所述方法包括:通过图像分类模型,对至少两个第一图像和至少两个第二图像进行分类处理,得到至少两个第一图像和至少两个第二图像的预测分类结果,所述第一图像携带有对应的分类标签,所述第二图像携带有对应的伪分类标签,所述第一图像和所述第二图像为包括待识别对象的图像,所述预测分类结果、所述分类标签和所述伪分类标签用于指示所述待识别对象的属性类型;基于所述至少两个第一图像的预测分类结果以及对应的分类标签,以及所述至少两个第二图像的预测分类结果以及对应的伪分类标签,获取第一损失值,所述第一损失值用于指示预测分类结果的准确度;基于所述至少两个第二图像的预测分类结果以及对应的伪分类标签,获取第二损失值,所述第二损失值用于指示所述伪分类标签的准确度;基于所述第一损失值和所述第二损失值,对所述图像分类模型的模型参数进行更新,基于所述至少两个第二图像的预测分类结果对对应的伪分类标签进行更新后继续执行分类处理和所述第一损失值和所述第二损失值的获取处理,直至当符合目标条件时得到目标图像分类模型。
本申请实施例提供了一种图像处理装置,所述装置包括:获取模块,配置为通过图像分类模型,对至少两个第一图像和至少两个第二图像进行分类处理,得到至少两个第一图像和至少两个第二图像的预测分类结果,所述第一图像携带有对应的分类标签,所述第二图像携带有对应的伪分类标签,所述第一图像和所述第二图像为包括待识别对象的图像,所述预测 分类结果、所述分类标签和所述伪分类标签用于指示所述待识别对象的属性类型;所述获取模块,还配置为基于所述至少两个第一图像的预测分类结果以及对应的分类标签,以及所述至少两个第二图像的预测分类结果以及对应的伪分类标签,获取第一损失值,所述第一损失值用于指示预测分类结果的准确度;所述获取模块,还配置为基于所述至少两个第二图像的预测分类结果以及对应的伪分类标签,获取第二损失值,所述第二损失值用于指示所述伪分类标签的准确度;更新模块,配置为基于所述第一损失值和所述第二损失值,对所述图像分类模型的模型参数进行更新,基于所述至少两个第二图像的预测分类结果对对应的伪分类标签进行更新后继续执行分类处理和所述第一损失值和所述第二损失值的获取处理,直至当符合目标条件时得到目标图像分类模型。
本申请实施例提供了一种电子设备,所述电子设备包括一个或多个处理器和一个或多个存储器,所述一个或多个存储器中存储有至少一条计算机程序,所述至少一条计算机程序由所述一个或多个处理器加载并执行以实现上述图像处理方法的各种可选实现方式。
本申请实施例提供了一种计算机可读存储介质,所述存储介质中存储有至少一条计算机程序,所述至少一条计算机程序由处理器加载并执行以实现上述图像处理方法的各种可选实现方式。
本申请实施例提供了一种计算机程序产品或计算机程序,所述计算机程序产品或所述计算机程序包括一条或多条程序代码,所述一条或多条程序代码存储在计算机可读存储介质中。电子设备的一个或多个处理器能够从计算机可读存储介质中读取所述一条或多条程序代码,所述一个或多个处理器执行所述一条或多条程序代码,使得电子设备能够执行上述任一种可能实施方式的图像处理方法。
本申请实施例在训练图像分类模型时采用伪分类标签对第二图像进行标注,并基于该伪分类标签设计有对应的损失值来指示其是否准确,基于此对模型参数进行更新,且更新过程中伪分类标签也会随之更新,这样仅需要部分图像对应有分类标签,在模型训练过程中为其他图像生成伪分类标签,也即是不需要所有图像均对应有分类标签,能够大大减少人工标注带来的人工成本,提高训练效率。且该伪分类标签在模型训练过程中不断更新,最终能够确定出与分类标签准确度差不多的伪分类标签,以提高训练样本数量,进而提高图像分类模型的准确度。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还能够根据这些附图获得其他的附图。
图1是本申请实施例提供的一种图像处理方法的实施环境的示意图;
图2是本申请实施例提供的一种图像处理方法的流程图;
图3是本申请实施例提供的一种图像处理方法的流程图;
图4是本申请实施例提供的一种训练系统框架的示意图;
图5是本申请实施例提供的一种图像处理方法的流程图;
图6是本申请实施例提供的一种图像处理装置的结构示意图;
图7是本申请实施例提供的一种电子设备的结构示意图;
图8是本申请实施例提供的一种终端的结构框图;
图9是本申请实施例提供的一种服务器的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
本申请中术语“第一”“第二”等字样用于对作用和功能基本相同的相同项或相似项进行区分,应理解,“第一”、“第二”、“第n”之间不具有逻辑或时序上的依赖关系,也不对数量和执行顺序进行限定。还应理解,尽管以下描述使用术语第一、第二等来描述各种元素,但这些元素不应受术语的限制。这些术语只是用于将一元素与另一元素区别分开。例如,在不脱离各种示例的范围的情况下,第一图像能够被称为第二图像,并且类似地,第二图像能够被称为第一图像。第一图像和第二图像都能够是图像,并且在某些情况下,能够是单独且不同的图像。
本申请中术语“至少一个”的含义是指一个或多个,本申请中术语“多个”的含义是指两个或两个以上,例如,多个数据包是指两个或两个以上的数据包。
应理解,在本文中对各种示例的描述中所使用的术语只是为了描述特定示例,而并非旨在进行限制。
还应理解,在本申请的各个实施例中,各个过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
还应理解,根据A确定B并不意味着仅仅根据A确定B,还能够根据A和/或其它信息确定B。
下面对本申请涉及到的名词进行说明。
伪分类标签(Pseudo Label):根据有分类标签的数据给出无分类标签数据的近似分类标签作为伪分类标签。也即是,对于无分类标签的数据(比如医学图像),根据其他有分类标签的数据,来为该无分类标签的数据确定出一种分类标签,该分类标签并非人工标记得到的,是通过一些技术手段计算得到的,因而可以称之为伪分类标签。伪分类标签算法属于自学习(Self Learning)方法的一种,在计算机视觉领域也被广泛的应用于各种分类任务。
医学图像,是通过影像组学,计算机断层扫描采集的与病灶或者发病区域相关的图像。通过对医学图像进行处理,能够分析出其中的属性类型。例如,可以分析其中是否有病灶,病灶所在位置,病灶的类型等。
病灶:机体上发生病变的部分,一个局限的、具有病原微生物的病变组织。例如肺的某一部分被结核菌破坏,这部分就是肺结核病灶。又例如,能够对耳镜数据进行分析,确定鼓膜部位是否有病灶以及病灶类型。
相关技术中通过训练图像分类模型,使得图像分类模型具备对图像处理的能力,来代替人工计算,得到图像中待识别对象的类型,作为示例,相关技术中能够通过终端(例如,医疗设备)对疾病相关的人体部位进行图像采集,例如,对耳中部位进行采集,得到耳部图像,也可以称之为耳镜数据。对耳部图像进行图像处理,能够确定出耳部图像中鼓膜所在位置,并分析出鼓膜的类型,例如:正常的、病理性耳膜等,其中,病理性耳膜还可能包括鼓膜硬化症、鼓膜炎等,但是图像处理所使用的标注数据,是基于人工标注得到的,人工标注过程耗费大量人工成本,效率低且容易出现错误。。
本申请实施例提供一种基于人工智能的图像处理方法、装置、电子设备和计算机可读存储介质,能够在训练图像分类模型时采用伪分类标签对第二图像进行标注,并在模型训练过程中为第二图像生成伪分类标签,能够大大减少人工标注带来的人工成本,提高训练效率。
本申请涉及人工智能技术,通过训练图像分类模型,使得图像分类模型具备对人体组织图像处理的能力,来代替人工计算。下面对人工智能进行介绍。人工智能技术是一门综合学科,涉及领域广泛,既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。
计算机视觉技术(CV,Computer Vision)计算机视觉是一门研究如何使机器“看”的科学,更进一步的说,就是指用摄影机和电脑代替人眼对目标进行识别、跟踪和测量等机器视觉,并进一步做图形处理,使电脑处理成为更适合人眼观察或传送给仪器检测的图像。作为一个科学学科,计算机视觉研究相关的理论和技术,试图建立能够从图像或者多维数据中获取信息的人工智能系统。计算机视觉技术通常包括图像处理、图像识别、图像语义理解、图像检索、光学字符识别(OCR,Optical Character Recognition)、视频处理、视频语义理解、视频内容/行为识别、三维物体重建、3D技术、虚拟现实、增强现实、同步定位与地图构建等技术,还包括常见的人脸识别、指纹识别等生物特征识别技术。
机器学习(ML,Machine Learning)是一门多领域交叉学科,涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。专门研究计算机怎样模拟或实现人类的学习行为,以获取新的知识或技能,重新组织已有的知识结构使之不断改善自身的性能。机器学习是人工智能的核心,是使计算机具有智能的根本途径,其应用遍及人工智能的各个领域。机器学习和深度学习通常包括人工神经网络、置信网络、强化学习、迁移学习、归纳学习、式教学习等技术。
本申请实施例提供的方案涉及人工智能的计算机视觉技术中图像处理、机器学习等技术,具体通过如下实施例进行说明。
下面对本申请的实施环境进行说明。
图1是本申请实施例提供的一种图像处理方法的实施环境的示意图。该实施环境包括终端101,或者该实施环境包括终端101和图像处理平台102。终端101通过无线网络或有线网络与图像处理平台102相连。
终端101(运行有客户端,例如医疗客户端等)可以被用来获取医学图像,例如,通过终端101的图像采集装置,采集图像,或者通过其他的图像采集设备将采集到的图像发送至终端101,终端101接收到包括待识别对象的图像。终端中运行的客户端中可以植入有图像处理插件,用以在客户端本地实现图像处理方法。例如,终端101获取训练图像分类模型的请求后,调用图像处理插件,以实现图像处理方法,即训练用于对待识别对象进行分类并得到属性类型的图像分类模型。
作为示例,图像可以为各种应用场景中的图像,在医疗场景中,例如,图像为医疗图像、当图像为医疗图像时,待识别对象为病灶相关的图像内容,待识别对象可以为耳中部位、B超中的阴影部分等等,属性类型为病灶类型,在自动驾驶场景中,当图像为交通图像时,待识别对象为道路元素,待识别对象的属性类型为道路元素的属性类型,例如,马路指引线、 信号灯等等,在人脸识别场景中,当图像为人脸图像时,待识别对象为人脸图像中的人脸,待识别对象的属性类型为人脸的年龄段类型以及性别类型,下面以图像为医疗图像作为例子进行具体说明。
在一些实施例中,终端101获取训练图像分类模型的请求后,调用图像处理平台102的图像处理接口(可以提供为云服务的形式,即图像处理服务),图像处理平台102根据医学图像对图像分类模型进行训练,例如,患者、医生、研究人员将医学图像输入至医疗应用后,医疗应用调用图像处理平台102的图像处理接口对图像分类模型进行训练,使得图像分类模型具有分辨医学图像的属性类型的能力,其中,本申请实施例提供的图像处理方法不是以有生命或动物体为对象的,且不是以获得疾病诊断结果或者健康状况为直接目的,不能直接获得疾病的诊断结果或健康状况,即属性类型不直接用于疾病诊断,仅作为中间数据,以辅助患者进行疾病的预测,辅助医生、研究人员进行疾病的诊断、复诊和治疗方法的研究。
示例性地,该终端101能够具有图像采集功能和图像处理功能,能够对采集到的图像进行处理,并根据处理结果执行相应的功能。例如,该终端101可以为耳镜设备。在一些实施例中,该终端101可以为便携式的耳镜设备。示例性的,图像处理平台102也能够具有图像采集功能和图像处理功能,能够对采集到的图像进行处理,并根据处理结果执行相应的功能。示例性的,该终端101能够独立完成该工作,也能够通过图像处理平台102为其提供数据服务。本申请实施例对此不作限定,例如,终端向图像处理平台102发送采集的经过标注的医学图像,图像处理平台102根据接收的第一图像和第二图像执行图像处理方法。
图像处理平台102包括一台服务器、多台服务器、云计算平台和虚拟化中心中的至少一种。图像处理平台102用于为支图像处理的应用程序提供后台服务。例如,图像处理平台102承担主要处理工作,终端101承担次要处理工作;或者,图像处理平台102承担次要处理工作,终端101承担主要处理工作;或者,图像处理平台102或终端101分别能够单独承担处理工作。或者,图像处理平台102和终端101两者之间采用分布式计算架构进行协同计算。
作为示例,该图像处理平台102包括至少一台服务器1021以及数据库1022,该数据库1022用于存储数据,在本申请实施例中,该数据库1022中能够存储有样本医学图像或者训练好的图像分类模型,为至少一台服务器1021提供数据服务。
在一些实施例中,该图像处理平台102中的服务器1021可以为区块链系统中的节点设备,服务器1021可以将样本医学图像,或者基于图像训练得到的图像分类模型存储至该区块链系统的区块链上。在一些实施例中,该区块链系统能够提供图像处理服务,在电子设备需要进行图像分类时,能够向该区块链系统发送图像分类请求,由该区块链系统的节点设备来响应于该图像分类请求,对图像进行分类。
本申请实施例提供的用于图像处理的电子设备可以是各种类型的终端或服务器,其中,服务器可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还能够是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN、以及大数据和人工智能平台等基础云计算服务的云服务器;终端可以是智能手机、平板电脑、笔记本电脑、台式计算机、智能音箱、智能手表、音频播放设备、视频播放设备、医疗设备、车辆等,但并不局限于此,终端安装和运行有支持图像处理的应用程序,例如,该应用程序能够是系统应用、图像处理应用等。终端以及服务器可以通过有线或无线通信方式进行直接或间接地连接,本申请在此不做限制。
以服务器为例,例如可以是部署在云端的服务器集群,向用户开放人工智能云服务(AiaaS,AI as a Service),平台会把几类常见的AI服务进行拆分,并在云端提供独立或者打包的服务,这种服务模式类似于一个AI主题商城,所有的用户都可以通过应用程序编程接口的方式来接入使用AIaaS平台提供的一种或者多种人工智能服务。
例如,其中的一种人工智能云服务可以为图像处理服务,即云端的服务器封装有本申请实施例提供的图像处理的程序。用户通过终端(运行有客户端,例如地图定位客户端、医疗客户端等)调用云服务中的图像处理服务,以使部署在云端的服务器调用封装的图像处理的程序,本申请实施例提供的图像处理方法不是以有生命或动物体为对象的,且不是以获得疾病诊断结果或者健康状况为直接目的,不能根据疾病类型的事件发展信息直接获得疾病的诊断结果或健康状况,仅用于辅助医生、研究人员进行疾病的诊断、复诊和治疗方法的研究。
本领域技术人员能够知晓,上述终端101、服务器1021的数量能够更多或更少。比如上述终端101、服务器1021能够仅为一个,或者上述终端101、服务器1021为几十个或几百个,或者更多数量,本申请实施例对终端或服务器的数量和设备类型不加以限定。
图2是本申请实施例提供的一种图像处理方法的流程图,方法应用于电子设备中,电子设备为终端或服务器,参见图2,以方法应用于终端为例,方法包括以下步骤。
201、终端通过图像分类模型,对至少两个第一图像和至少两个第二图像进行分类处理,得到至少两个第一图像和至少两个第二图像的预测分类结果,第一图像携带有对应的分类标签,第二图像携带有对应的伪分类标签,第一图像和第二图像为包括待识别对象的图像,预测分类结果、分类标签和伪分类标签用于指示待识别对象的属性类型。
图像可以为各种应用场景中的图像,例如,图像为医疗图像、交通图像等等,当图像为医疗图像时,待识别对象为病灶相关的图像内容,待识别对象可以为耳中部位、B超中的阴影部分等等,后续以图像为医疗图像为例进行具体说明,第一图像和第二图像的区别在于两者的标注不同,第一图像携带分类标签,分类标签用于表示图像的正确分类结果或者真实分类结果。分类标签可以由人工标注得到。例如,分类类别的数量可以为两个或者两个以上。比如,共有5个分类类别,可以设置有0、1、2、3、4这五个分类标签来分别标识5个分类类别。每个分类标签对应一种分类类别,通过分类标签能够获知携带有分类标签的图像的正确分类结果,也即是指图像分类后应当属于那种分类类别。第二图像携带伪分类标签,伪分类标签相对于上述分类标签,并非人工标注得到,而是通过对图像进行处理,为其赋予的一种类似于分类标签的标签。
在本申请实施例中,伪分类标签可以在图像分类模型的训练过程中生成,且训练过程中图像分类模型的模型参数在不断优化,因而,伪分类标签也在训练过程中不断优化。
第一图像和第二图像为样本图像,也可以称之为样本医学图像。预测分类结果为图像分类模型得到的一种预测结果,分类标签和伪分类标签则用于指示真值,指示正确的或真实的结果。通过图像分类模型来对这些样本图像进行处理,并基于图像分类模型得到的预测分类结果以及其携带的分类标签或伪分类标签,来分析图像分类模型得到的预测分类结果是否准确,以此来对图像分类模型的模型参数来进行优化,提高图像分类模型的处理性能。
作为示例,预测分类结果、分类标签和伪分类标签用于指示图像中属性类型。属性类型可以为图像中一些属性的类型,例如,属性可以为图像中的病灶,医学图像可以为耳部图像,也可以称之为耳镜图像,上述预测分类结果、分类标签和伪分类标签可以用于指示耳部图像 中鼓膜的类型,类型可以为是否正常,如果是正常,正常的类型还可以包括完全健康、健康或治愈康复。如果是不正常,不正常的类型还可以包括鼓膜炎、鼓膜硬化症等。又例如,属性还可以为图像中的其它属性,比如图像清晰度、图像采集距离、色调、图像风格等。在此仅为一种示例性说明,本领域技术人员可以根据需求设置上述类型,本申请实施例对此不作限定。
202、终端基于至少两个第一图像的预测分类结果以及对应的分类标签,以及至少两个第二图像的预测分类结果以及对应的伪分类标签,获取第一损失值,第一损失值用于指示预测分类结果的准确度。
终端确定出预测分类结果后,可以根据图像携带的标签来评估预测分类结果是否准确,以此来衡量图像分类模型的处理性能。
在此通过第一损失值来作为处理性能的一种衡量指标。通过对比预测结果(预测分类结果)和正确结果(分类标签或伪分类标签),能够确定预测结果的准确度。可以理解地,预测分类结果越准确,说明图像分类模型的处理性能(如准确度)越好,模型训练的目的也即是提高图像分类模型的处理性能。
203、终端基于至少两个第二图像的预测分类结果以及对应的伪分类标签,获取第二损失值,第二损失值用于指示伪分类标签的准确度。
在本申请实施例中,不仅通过预测分类结果以及分类标签或伪分类标签,确定了预测分类结果的准确度,以提高模型的分类准确度,还能够估算伪分类标签的准确度,进一步确定图像分类模型分类的准确度,这样在下个阶段训练时伪分类标签会再基于预测分类结果更新,通过多个阶段的训练过程,伪分类标签越来越准确,图像分类模型分类也越来越准确。
204、终端基于第一损失值和第二损失值,对图像分类模型的模型参数进行更新,基于至少两个第二图像的预测分类结果对对应的伪分类标签进行更新后继续执行分类处理、以及第一损失值和第二损失值的获取处理,直至当符合目标条件时得到目标图像分类模型直至当符合目标条件时。
具体的,当不满目标条件时,执行以下处理:基于至少两个第二图像的预测分类结果对对应的伪分类标签进行更新;通过更新模型参数得到的图像分类模型,对至少两个第一图像和至少两个第二图像进行分类处理,得到至少两个第一图像和至少两个第二图像的更新预测分类结果;基于至少两个第一图像和至少两个第二图像的更新预测分类结果、至少两个第一图像对应的分类标签、以及至少两个第二图像对应的更新后的伪分类标签,对更新得到的图像分类模型的模型参数进行再次更新;当满足目标条件时,将更新模型参数得到的图像分类模型确定为目标图像分类模型。
上述步骤201为一次迭代过程,终端获取到需要的损失值后,能够基于损失值对模型参数进行更新,然后能够对下个阶段所采用的伪分类标签进行更新。通过多次迭代过程,图像分类模型的分类准确度逐渐提高,能够得到符合条件的目标图像分类模型。目标图像分类模型为训练好的模型。
本申请实施例在训练图像分类模型时采用伪分类标签对第二图像进行标注,并基于伪分类标签设计有对应的损失值来指示其是否准确,基于此对模型参数进行更新,且更新过程中伪分类标签也会随之更新,这样仅需要部分图像对应有分类标签,在模型训练过程中为其他图像生成伪分类标签,也即是不需要所有图像均对应有分类标签,能够大大减少人工标注带 来的人工成本,提高训练效率。且伪分类标签在模型训练过程中不断更新,最终能够确定出与分类标签准确度差不多的伪分类标签,以提高训练样本数量,进而提高图像分类模型的准确度。
图3是本申请实施例提供的一种图像处理方法的流程图,参见图3,方法包括以下步骤。
301、终端获取至少两个第一图像和至少两个第二图像,第一图像携带有对应的分类标签,第一图像和第二图像为医学图像。
在本申请实施例中,包括两种图像:第一图像和第二图像。第一图像和第二图像均为图像分类模型训练时所用到的样本图像。两种图像按照有无分类标签进行分类,有分类标签的称为第一图像,无分类标签的称为第二图像。分类标签也即是真实分类标签。
302、终端基于图像分类模型,获取至少两个第一图像和至少两个第二图像的初始预测分类结果。
终端可以调用图像分类模型,图像分类模型可以为初始模型,图像分类模型的模型参数还有待优化。
终端基于图像分类模型获取预测分类结果时,可以先对图像进行特征提取,再基于图像特征对图像进行分类。也即是,步骤302中,电子设备可以将至少两个第一图像和至少两个第二图像输入图像分类模型中,由图像分类模型对至少两个第一图像和至少两个第二图像进行特征提取,得到至少两个第一图像和至少两个第二图像的图像特征,分别基于至少两个第一图像和至少两个第二图像的图像特征,对第一图像和第二图像进行分类,得到第一图像和第二图像的预测分类结果。
在本阶段,第一图像携带有对应的分类标签,而第二图像则并没有携带对应的分类标签,因而,还需要根据本阶段的预测分类结果,处理得到第二图像的伪分类标签。这样在下个阶段,第二图像则可以携带有伪分类标签,能够参与到分类损失中。
下面通过一个具体示例对本申请提供的方法的系统框架进行说明。图像分类模型的训练系统框架如图4所示,系统框架可以主要由三个模块组成:特征提取模块(F,Feature Extractor)401、高斯混合模块(G,Gaussian Mixture)402和分类器模块(C,Classifier)403。
其中,特征提取模块401是为了提取可学习的深度特征表示型,将表示型输入到分类器中即可学习到输入数据对应的预测结果。也即是特征提取模块401用于提取图像的图像特征。
对于至少两个第一图像和至少两个第二图像,可以称至少两个第一图像以及第一图像对应的分类标签为源域数据集,用(X S,Y S)表示源域数据集(即训练集)的图像和真实分类标签。还可以称至少两个第二图像以及第二图像对应的伪分类标签为目标域数据集,用(X T,Y T)表示目标域数据集(即验证集)的图像和伪分类标签。
对于源域数据集和目标域数据集,先用特征提取模块401(特征提取模块401可以为一个特征提取网络(F))从源域和目标域数据中提取特征表示型(也即是图像特征),然后将源域和目标域的图像特征(F S,F T)输入到分类器模块403(分类器模块403可以为分类器(C))中,由分类器模块403输出源域和目标域的预测结果(P S,P T),也即是预测分类结果。目标域的预测结果输入到高斯混合模块402(高斯混合模块402可以为一个高斯混合模型)中计算预测结果重要性的权值(Weights)及其估算的伪分类标签(Pseudo-label),高斯混合模块402用于计算上述权值。权值将和伪分类标签一起参与到鲁棒伪分类标签损失函数(Robust Pseudo-label Loss)的计算中,从而学习到更加准确的目标集测试结果。也即 是上述第二损失值的获取过程。对于高斯混合模块402的说明,具体参见下述步骤307。在此仅对图像分类模型进行特征提取和分类的过程进行说明。
其中,对于特征提取模块401和分类器模块403,它们可以是任意深度学习网络结构,比如特征提取模块和分类器模块可以采用残差网路(ResNet50),也可以采用其他网络结构,比如高效网络(EfficientNet),移动网络(MobileNet),密集网络(DenseNet)等任意深度学习网络。在一些实施例中,上述特征提取模块401可以采用ResNet50的全连接层前面的网络,分类器模块403可以采用最后的全连接层。伪分类标签的获取方式也即是基于全连接层输出的目标域的预测结果通过计算得到。
303、终端基于第一图像携带的分类标签与第一图像的预测分类结果,获取初始的第一损失值。
终端通过图像分类模型预测了第一图像和第二图像的预测分类结果后,预测分类结果为一种预测值,由于第一图像还携带有真实标签,也即是真值,终端可以通过预测值和真值的误差,来作为模型训练依据。第一损失值可以用于指示第一图像的预测分类结果的准确度。第一损失值用于确定图像分类模型的分类结果的准确度,可以称之为分类损失。对医学图像的分类过程也可以理解为分割过程,因而,分类过程还可以称为分割过程,分类损失,也可以称之为分割损失。
在一些实施例中,第一损失值可以为交叉熵损失函数的取值。通过交叉熵损失函数能够计算得到分类标签和预测分类结果之间的误差,以此来衡量图像分类模型的分类能力。可以理解地,如果第一损失值为交叉熵损失值,第一损失值越大,说明预测分类结果与分类标签之间的误差越大,图像分类模型的分类能力越差。第一损失值越小,说明预测分类结果与分类标签之间的误差越小,图像分类模型的分类能力越好。
当然,第一损失值还可以采用其他损失函数得到,例如,L1或L2等,本申请实施例对此不作限定。
304、终端基于第一损失值,对图像分类模型的模型参数进行更新。
终端获取到第一损失值后,可以基于第一损失值来对模型参数进行更新。模型参数更新过程可以通过任一种模型更新算法实现。在一些实施例中,更新过程可以通过随机梯度下降(SGD,StochasticGradientDescent)算法实现.本申请实施例对此不作限定。
305、终端基于至少两个第二图像的预测分类结果,获取至少两个第二图像对应的伪分类标签。
第二图像原本并未携带有对应的分类标签,则需要基于预测分类结果计算得到,这样在下个阶段中,第二图像则可以携带有对应的伪分类标签,进而参与到分类损失的计算中,以达到扩充样本图像的目的,进一步提高图像分类模型的准确度。且过程也能够为第二图像计算出准确度好的伪分类标签。
在一些实施例中,在获取伪分类标签时,终端可以对图像进行聚类,基于聚类结果确定伪分类标签。具体的,终端可以对至少两个第一图像和至少两个第二图像进行聚类,聚类结果,其中,聚类结果包括至少两个聚类中心以及对应每个聚类中心的图像,根据每个第二图像与所属聚类中心之间的距离,获取每个第二图像的权值。
例如,在一个具体示例中,伪分类标签(Pseudo-label)可以通过下述公式实现:
Figure PCTCN2022079496-appb-000001
其中,
Figure PCTCN2022079496-appb-000002
用于表示伪分类标签,argmax k是一种函数,是对函数求参数(集合)的函数,是指在C(F(X T))取得最大值时的x k。[·] k表示使用余弦(cosine)相似度(即
Figure PCTCN2022079496-appb-000003
Figure PCTCN2022079496-appb-000004
)计算的第k个聚类中心。C()为分类过程,F(X T)表示对第二图像X T进行特征提取。
306、终端基于图像分类模型,获取至少两个第一图像和至少两个第二图像的预测分类结 果,第一图像携带有对应的分类标签,第二图像携带有对应的伪分类标签。
步骤306与上述步骤302同理,在此不多做赘述。
307、终端基于至少两个第一图像的预测分类结果以及对应的分类标签,以及至少两个第二图像以及对应的伪分类标签,获取第一损失值,第一损失值用于指示图像的预测分类结果的准确度。
在本阶段中,第一图像和第二图像均对应有分类标签或伪分类标签,则两种图像均可以参与到分类损失或者分割损失的计算中,因而终端可以基于两种图像的预测分类结果、分类标签或伪分类标签来计算第一损失值。
步骤307与上述步骤303同理,不同的是,在步骤307中,第二图像对应有伪分类标签,则也可以参与到分类损失或分割损失的计算中,因而,终端获取第一损失值时,也将第二图像的预测分类结果以及伪分类标签之间的误差计算在内。本申请实施例在此不作过多赘述。
308、终端基于至少两个第二图像的预测分类结果以及对应的伪分类标签,获取第二损失值,第二损失值用于指示伪分类标签的准确度。
对于伪分类标签,终端还可以评估伪分类标签是否准确,是否为真实分类标签,或者说与真实分类标签是否比较贴近。因而,终端还可以根据第二图像的预测分类结果以及伪分类标签,获取第二损失值,以此来判断伪分类标签是否准确。
具体地,终端可以获取至少两个第二图像的预测分类结果以及对应的伪分类标签之间的误差,然后根据至少两个第二图像的权值,对至少两个第二图像对应的误差进行加权,得到第二损失值。通过为每个第二图像设置权值,根据伪分类标签是否准确的情况,适应性地调整第二图像的权值,从而保证伪分类标签的准确度。
在一些实施例中,在获取权值时,能够通过聚类方式实现,对于同样类别的图像,其图像特征的分布会比较相似,因而,通过对图像进行聚类,根据与不同聚类中心的距离,能够分析出第二图像中属性类型,自然也就能体现出伪分类标签为正确的分类标签的可能性。对于分类比较准确的,可以为其设置较高的权重;分类比较差的,则设置较低的权重。也即是,上述权值可以与第二图像的伪分类标签是否为正确的分类标签对应,或者说权值可以为第二图像的伪分类标签是否为正确的分类标签的概率对应。
具体地,终端可以对至少两个第一图像和至少两个第二图像进行聚类,得到至少两个聚类中心,根据每个第二图像与所属聚类中心之间的距离,获取每个第二图像的权值,基于每个第二图像与所属聚类中心的距离获取每个第二图像的权值,可以有效提高权值的正确度,从而提高后续第二损失值的准确度。
在一些实施例中,上述聚类用于将图像特征相似的图像聚类到一起,从而从图像特征层面确定不同类型的图像的分布情况,以在特征空间中通过类别中心的特征距离建模,以得到准确的权值。具体地,终端可以根据每个第二图像与所属聚类中心之间的距离,获取每个图像对应的概率,概率为伪分类标签为正确分类标签的概率,然后基于每个第二图像对应的概率,获取每个图像的权值,利用概率来描述权值,可以有效提高第二损失值的准确度以及有效性,从而提高模型训练效率。
在一些实施例中,对于概率与权值的关系,终端可以响应于任一第二图像对应的概率大于或等于概率阈值,获取概率作为第二图像的权值。终端可以响应于任一第二图像对应的概率小于概率阈值,获取零值作为第二图像的权值。
概率阈值可以由相关技术人员根据需求进行设置,例如,概率阈值可以为0.5,本申请实施例对此不作限定。通过上述方法,为分类准确的图像设置较高的权值,为分类不准确的 图像设置较低的权值,进一步提高权值的准确度,提高伪分类标签的准确度。
在一些实施例中,步骤308可以通过高斯混合模型实现,具体地,基于混合高斯模型,对至少两个第二图像、至少两个第二图像的预测分类结果以及对应的伪分类标签进行处理,得到第二损失值。也即是,电子设备可以将第二图像的预测分类结果以及伪分类标签输入高斯混合模型中,由高斯混合模型对输入数据进行处理,输出第二损失值。
其中,高斯混合模型就是用高斯概率密度函数(正态分布曲线)精确地量化事物,它是一个将事物分解为若干的基于高斯概率密度函数(正态分布曲线)形成的模型。
在一些实施例中,电子设备还可以基于第二损失值,对高斯混合模型的模型参数进行更新。
例如,对于源域
Figure PCTCN2022079496-appb-000005
或目标域
Figure PCTCN2022079496-appb-000006
图像,分类网络的输出
Figure PCTCN2022079496-appb-000007
Figure PCTCN2022079496-appb-000008
Figure PCTCN2022079496-appb-000009
分别表示源域和目标域数据的预测分类结果,
Figure PCTCN2022079496-appb-000010
表示目标域数据的伪分类标签,其中K表示分类的类别数目,N S,N T分别表示源域和目标域图像的数目。权值可以通过下述公式二实现:
Figure PCTCN2022079496-appb-000011
其中,
Figure PCTCN2022079496-appb-000012
表示模型估算的伪分类标签是正确分类标签的概率,其中z j∈{0,1}是为每一个目标域样本的伪分类标签引入的中间变量,它表示的是预测的伪分类标签是正确(z j=1)的还是错误(z j=0)的。由公式二可知,如果估算的伪分类标签是正确分类标签的概率小于0.5的话,伪分类标签会被去除。上述使用了高斯混合模型并利用数据特征距离不同类别中心的距离来计算伪分类标签是正确分类标签的概率。
309、终端基于第一损失值和第二损失值,对图像分类模型的模型参数进行更新,基于至少两个第二图像的预测分类结果对对应的伪分类标签进行更新后继续执行分类和损失值获取步骤,直至当符合目标条件时得到目标图像分类模型。
终端综合两个损失值,对模型参数进行更新。在综合时,可以基于第一损失值和第二损失值获取目标损失值,以目标损失值为依据进行模型参数的更新。
在一些实施例中,终端可以对第一损失值和第二损失值进行加权得到目标损失值,对第一损失值和第二损失值加权处理得到目标损失值,从而基于目标损失值进行训练时可以提高训练效率。在权重为1时,终端可以基于第一损失值和第二损失值之和作为目标损失值,例如,用
Figure PCTCN2022079496-appb-000013
表示,如下公式三所示。
Figure PCTCN2022079496-appb-000014
Figure PCTCN2022079496-appb-000015
Figure PCTCN2022079496-appb-000016
其中,(X S,Y S)表示源域数据,X S是指第一图像,Y S是指第一图像携带的分类标签。
Figure PCTCN2022079496-appb-000017
表示目标域数据,X T是指第二图像,
Figure PCTCN2022079496-appb-000018
是指第二图像携带的伪分类标签。
Figure PCTCN2022079496-appb-000019
是源域数据中第一图像X S和携带的分类标签Y S的交叉墒损失函数,
Figure PCTCN2022079496-appb-000020
是目标域数据中第二 图像及其预测的伪分类标签的鲁棒损失函数,也即是第二损失值。w是伪分类标签为正确分类标签的权值,
Figure PCTCN2022079496-appb-000021
MAE(·)表示平均绝对值误差(Mean Absolute Error)。F是指特征提取模块,C是指分类器模块。分类网络的输出
Figure PCTCN2022079496-appb-000022
Figure PCTCN2022079496-appb-000023
Figure PCTCN2022079496-appb-000024
分别表示源域和目标域数据的预测分类结果,其中K表示分类的类别数目,N S,N T分别表示第一图像和第二图像的数目。
在示例中,第一损失值仅包括第一图像计算得到的交叉熵损失函数,而并未计入第二图像计算得到的交叉熵损失函数。需要说明的是,终端可以根据设置,来确定是否将第二图像计算得到的交叉熵损失函数来计入第一损失值。在最初始迭代时,第二图像还并未获取到伪分类标签时,计算不到交叉熵损失函数,则可以不计入第一损失值,如果第二图像已携带有伪分类标签,也可以将计算得到的交叉熵损失函数计入第一损失值中。
在一些实施例中,终端还可以根据至少两个第一图像和至少两个第二图像,以及对应的预测分类结果,获取至少两个第一图像和至少两个第二图像的预测类型,预测类型用于指示图像为第一图像或第二图像,根据至少两个第一图像和至少两个第二图像的预测类型,获取第三损失值,第三损失值用于指示预测类型的准确度。相应地,步骤309中,终端可以基于第一损失值、第二损失值以及第三损失值,对图像分类模型的模型参数进行更新。
在一些实施例中,终端可以基于判别网络,根据至少两个第一图像和至少两个第二图像以及对应的预测分类结果,对至少两个第一图像和至少两个第二图像的类型进行判别,得到至少两个第一图像和至少两个第二图像的预测类型,然后根据第三损失值,对判别网络的网络参数进行更新,从而以分割网络和判别网络交替更新的方式进行训练,从而有效提高了模型训练效率。
也即是,在训练过程中,采用分阶段的训练方式。用前一阶段的训练结果生成伪分类标签,并将其应用在当前阶段的分割网络训练过程中。而在每一阶段中,分割网络和判别网络以交替更新的方式一起训练。
在训练过程中,首先将图像数据输入到分割网络中,利用源域数据的真实分类标签和目标域数据的伪分类标签来计算分割损失
Figure PCTCN2022079496-appb-000025
也即是上述分类损失或第一损失值。然后通过最小化分割损失来更新分割网络的参数。分割后,还能够通过判别网络对源域和目标域数据进行判别,通过判别使得两个域的数据分类更加类似。具体地,我们将分割网络输出的分割结果P S和P T同时输入到判别网络中,并利用由P T生成的信息熵结果来计算对抗损失
Figure PCTCN2022079496-appb-000026
对抗损失
Figure PCTCN2022079496-appb-000027
也即是,上述根据判别得到的预测类型的第三损失值。然后可以通过最大化对抗损失来更新判别网络的参数。随后,对抗损失函数产生的误差也会被反传回给分割网络,我们通过最小化对抗损失来更新分割网络的参数,目的是使得分割网络对源域图像和目标域图像预测出的分割结果能够越来越相似,实现领域自适应。在优化网络参数的过程中,本申请实施例中可以使用SGD算法优化和训练分割网络(图像分类模型),使用Adam算法优化和训练判别网络。
对于上述目标条件,可以为损失值收敛,也可以为迭代次数达到目标次数,目标条件还可以与学习率相关,在一个具体示例中,分割网络和判别网络的初始学习率分别为2.5x10 -4和1x10 -4。学习率能够影响模型收敛的速度。
终端训练好图像分类模型后,则可以将其应用到图像分类场景中,例如,图像分类场景可以为对耳镜数据进行分割。终端可以响应于图像处理指令,获取第三图像,第三图像为医学图像,将第三图像输入目标图像分类模型中,由目标图像分类模型对第三图像进行特征提 取,基于提取到的图像特征对第三图像进行分类,得到第三图像中待识别对象的属性类型。同理地,属性类型可以根据图像分类场景来确定,例如,在对耳镜数据分割场景中,待识别对象为耳部图像,属性类型可以为鼓膜的类型。又例如,在对脑部图像分割场景中,属性类型可以是指人体组织是否为脑瘤或脑癌等。又例如,在对图像采集距离分类场景中,属性类型可以为距离远或距离近等。过程也可以认为是模型的测试过程,通过训练好的模型对获取到的图像进行准确分类。例如,如图5所示,对于测试(Test)集,可以将其输入训练好的图像分类模型,图像分类模型包括特征提取模块501和分类器模块502,最后由分类器模块502输出测试集的预测结果,也即是,测试预测(Test Prediction)。
本申请实施例在训练图像分类模型时采用伪分类标签对第二图像进行标注,并基于伪分类标签设计有对应的损失值来指示其是否准确,基于此对模型参数进行更新,且更新过程中伪分类标签也会随之更新,这样仅需要部分图像对应有分类标签,在模型训练过程中为其他图像生成伪分类标签,也即是不需要所有图像均对应有分类标签,能够大大减少人工标注带来的人工成本,提高训练效率。且伪分类标签在模型训练过程中不断更新,最终能够确定出与分类标签准确度差不多的伪分类标签,以提高训练样本数量,进而提高图像分类模型的准确度。
上述所有可选技术方案,能够采用任意结合形成本申请的可选实施例,在此不再一一赘述。
在实施本申请实施例的过程中,使用耳内镜数据9804张,其中胆脂瘤、慢性化脓性中耳炎以及正常数据分别有4175、2281和3348张,将数据(类别按比例的)以患者为单位(即同一患者的数据不会被划分开)分为训练集、验证集和测试集,而训练集包含8710张图像,验证集和测试集分别包含548和546张,图像大小均被resize为256×256;在测试过程,使用包含546张的验证集来测试结果。
使用ResNet50作为基础网络,在ResNet50上加入本申请实施例提供的算法进行方法验证。在不同训练数据大小(即分别使用10%~100%的数据)进行大量的实验,实验结果如表1和表2所示。可以看到,本申请实施例提供的算法在不同的样本量的训练数据上都可以显著提高基于医疗图像对中耳炎的病灶进行分类的性能。
表1 ResNet50在不同样本量的训练数据上的实验结果
  精确度 准确度 召回率 分数
train10%_val 0.424908 0.7606 0.4249 0.5166
Train20%_val 0.554945 0.7601 0.5549 0.6086
Train30%_val 0.622711 0.8773 0.6227 0.6904
Train40%_val 0.589744 0.8077 0.5897 0.6531
Train50%_val 0.745421 0.9560 0.7454 0.8023
Train60%_val 0.771062 0.9414 0.7711 0.8216
Train70%_val 0.787546 0.9766 0.7875 0.8512
Train80%_val 0.785714 0.9432 0.7857 0.8361
Train90%_val 0.776557 0.9799 0.7766 0.8325
Train100%_val 0.695971 0.7970 0.6960 0.7157
表2 本申请实施例提出算法的实验结果
  精确度 准确度 召回率 分数
train10%_val 0.84065 0.847825 0.84065 0.83403
Train20%_val 0.80952 0.83458 0.80952 0.79803
Train30%_val 0.92307 0.92425 0.92307 0.92213
Train40%_val 0.86080 0.87482 0.86080 0.85966
Train50%_val 0.92124 0.92331 0.92124 0.92048
Train60%_val 0.82051 0.82056 0.82051 0.81508
Train70%_val 0.88095 0.88030 0.88095 0.87860
Train80%_val 0.91941 0.91879 0.91941 0.91872
Train90%_val 0.92307 0.92485 0.92307 0.92217
Train100%_val 0.90659 0.90885 0.90659 0.90574
可以理解的是,在本申请各实施例中,涉及到用户信息、用户图像等相关的数据,当本申请各实施例运用到具体产品或技术中时,需要获得用户许可或者同意,且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准。
图6是本申请实施例提供的一种图像处理装置的结构示意图,参见图6,装置包括:601,用于配置为通过图像分类模型,对至少两个第一图像和至少两个第二图像进行分类处理,得到至少两个第一图像和至少两个第二图像的预测分类结果,所述第一图像携带有对应的分类标签,所述第二图像携带有对应的伪分类标签,所述第一图像和所述第二图像为包括待识别对象的图像,所述预测分类结果、所述分类标签和所述伪分类标签用于指示所述待识别对象的属性类型;获取模块601,还配置为基于所述至少两个第一图像的预测分类结果以及对应的分类标签,以及所述至少两个第二图像的预测分类结果以及对应的伪分类标签,获取第一损失值,所述第一损失值用于指示预测分类结果的准确度;获取模块601,还配置为基于所述至少两个第二图像的预测分类结果以及对应的伪分类标签,获取第二损失值,所述第二损失值用于指示所述伪分类标签的准确度;更新模块602,配置为基于所述第一损失值和所述第二损失值,对所述图像分类模型的模型参数进行更新,基于所述至少两个第二图像的预测分类结果对对应的伪分类标签进行更新后继续执行分类处理和所述第一损失值和所述第二损失值的获取处理,直至当符合目标条件时得到目标图像分类模型。
在一些实施例中,获取模块601配置为:获取至少两个第二图像的预测分类结果以及对应的伪分类标签之间的误差;根据至少两个第二图像的权值,对至少两个第二图像对应的误差进行加权,得到第二损失值。
在一些实施例中,获取模块601配置为:对至少两个第一图像和至少两个第二图像进行聚类,得到聚类结果,其中,聚类结果包括至少两个聚类中心以及对应每个聚类中心的图像;根据每个第二图像与所属聚类中心之间的距离,获取每个第二图像的权值。
在一些实施例中,获取模块601配置为:根据每个第二图像与所属聚类中心之间的距离,获取至少两个第二图像对应的概率,概率为伪分类标签为正确分类标签的概率;基于至少两个第二图像对应的概率,获取至少两个第二图像的权值。
在一些实施例中,获取模块601配置为:响应于任一第二图像对应的概率大于或等于概率阈值,获取概率作为第二图像的权值;响应于任一第二图像对应的概率小于概率阈值,获取零值作为第二图像的权值。
在一些实施例中,更新模块602配置为根据至少两个第二图像的预测分类结果和聚类结果,获取每个第二图像的预测分类结果对应的伪分类标签。
在一些实施例中,获取模块601配置为:基于混合高斯模型,对至少两个第二图像、至少两个第二图像的预测分类结果以及对应的伪分类标签进行处理,得到第二损失值;更新模块602还配置为基于第二损失值,对高斯混合模型的模型参数进行更新。
在一些实施例中,获取模块601还配置为根据至少两个第一图像和至少两个第二图像,以及对应的预测分类结果,获取至少两个第一图像和至少两个第二图像的预测类型,预测类型用于指示图像为第一图像或第二图像;
获取模块601还配置为根据至少两个第一图像和至少两个第二图像的预测类型,获取第三损失值,第三损失值用于指示预测类型的准确度;
在一些实施例中,更新模块602配置为基于第一损失值、第二损失值以及第三损失值,对图像分类模型的模型参数进行更新。
在一些实施例中,获取模块601配置为基于判别网络,对至少两个第一图像和至少两个第二图像以及对应的预测分类结果进行判别处理,得到至少两个第一图像和至少两个第二图像的预测类型;更新模块602还配置为根据第三损失值,对判别网络的网络参数进行更新。
在一些实施例中,获取模块601还配置为响应于图像处理指令,获取第三图像,第三图像为医学图像;装置还包括:分类模块,配置为将第三图像输入目标图像分类模型中,由目标图像分类模型对第三图像进行特征提取,基于提取到的图像特征对第三图像进行分类,得到第三图像中病灶的类型。
本申请实施例提供的装置,在训练图像分类模型时采用伪分类标签对第二图像进行标注,并基于该伪分类标签设计有对应的损失值来指示其是否准确,基于此对模型参数进行更新,且更新过程中伪分类标签也会随之更新,这样仅需要部分图像对应有分类标签,在模型训练过程中为其他图像生成伪分类标签,也即是不需要所有图像均对应有分类标签,能够大大减少人工标注带来的人工成本,提高训练效率。且该伪分类标签在模型训练过程中不断更新,最终能够确定出与分类标签准确度差不多的伪分类标签,以提高训练样本数量,进而提高图像分类模型的准确度。
需要说明的是:上述实施例提供的图像处理装置在基于人工智能进行图像处理时,仅以上述各功能模块的划分进行举例说明,实际应用中,能够根据需要而将上述功能分配由不同的功能模块完成,即将图像处理装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的图像处理装置与图像处理方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
图7是本申请实施例提供的一种电子设备的结构示意图,该电子设备700可因配置或性能不同而产生比较大的差异,能够包括一个或一个以上处理器(CPU,Central Processing Units)701和一个或一个以上的存储器702,其中,该存储器702中存储有至少一条计算机程序,该至少一条计算机程序由该处理器701加载并执行以实现上述各个方法实施例提供的图像处理方法。该电子设备还能够包括其他用于实现设备功能的部件,例如,该电子设备还能够具有有线或无线网络接口以及输入输出接口等部件,以便进行输入输出。本申请实施例在此不做赘述。
上述方法实施例中的电子设备能够实现为终端。例如,图8是本申请实施例提供的一种终端的结构框图。该终端800可以是便携式移动终端,比如:智能手机、平板电脑、笔记本 电脑、台式计算机、智能音箱、智能手表、音频播放设备、视频播放设备、医疗设备等。终端800还可能被称为用户设备、便携式终端、膝上型终端、台式终端等其他名称。
通常,终端800包括有:处理器801和存储器802。
处理器801可以包括一个或多个处理核心,比如4核心处理器、8核心处理器等。处理器801可以采用数字信号处理器(DSP,Digital Signal Processing)、现场可编程门阵列(FPGA,Field-Programmable Gate Array)、可编程逻辑阵列(PLA,Programmable Logic Array)中的至少一种硬件形式来实现。处理器801也可以包括主处理器和协处理器,主处理器是用于对在唤醒状态下的数据进行处理的处理器,也称中央处理器;协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中,处理器801可以集成有图像处理器(GPU,Graphics Processing Unit),GPU用于负责显示屏所需要显示的内容的渲染和绘制。一些实施例中,处理器801还可以包括AI处理器,该AI处理器用于处理有关机器学习的计算操作。
存储器802可以包括一个或多个计算机可读存储介质,该计算机可读存储介质可以是非暂态的。存储器802还可包括高速随机存取存储器,以及非易失性存储器,比如一个或多个磁盘存储设备、闪存存储设备。在一些实施例中,存储器802中的非暂态的计算机可读存储介质用于存储至少一个指令,该至少一个指令用于被处理器801所执行以实现本申请中方法实施例提供的图像处理方法。
在一些实施例中,终端800还可选包括有:外围设备接口803和至少一个外围设备。处理器801、存储器802和外围设备接口803之间可以通过总线或信号线相连。各个外围设备可以通过总线、信号线或电路板与外围设备接口803相连。具体地,外围设备包括:射频电路804、显示屏805、摄像头组件806、音频电路807、定位组件808和电源809中的至少一种。
外围设备接口803可被用于将I/O(Input/Output,输入/输出)相关的至少一个外围设备连接到处理器801和存储器802。在一些实施例中,处理器801、存储器802和外围设备接口803被集成在同一芯片或电路板上;在一些其他实施例中,处理器801、存储器802和外围设备接口803中的任意一个或两个可以在单独的芯片或电路板上实现,本实施例对此不加以限定。
射频电路804用于接收和发射射频(RF,RadioFrequency)信号,也称电磁信号。射频电路804通过电磁信号与通信网络以及其他通信设备进行通信。射频电路804将电信号转换为电磁信号进行发送,或者,将接收到的电磁信号转换为电信号。作为示例,射频电路804包括:天线系统、RF收发器、一个或多个放大器、调谐器、振荡器、数字信号处理器、编解码芯片组、用户身份模块卡等等。射频电路804可以通过至少一种无线通信协议来与其它终端进行通信。该无线通信协议包括但不限于:万维网、城域网、内联网、各代移动通信网络(2G、3G、4G及5G)、无线局域网和/或无线保真(WiFi,Wireless Fidelity)网络。在一些实施例中,射频电路804还可以包括近距离无线通信(NFC,Near Field Communication)有关的电路,本申请对此不加以限定。
显示屏805用于显示用户界面(UI,User Interface)。该UI可以包括图形、文本、图标、视频及其它们的任意组合。当显示屏805是触摸显示屏时,显示屏805还具有采集在显示屏805的表面或表面上方的触摸信号的能力。该触摸信号可以作为控制信号输入至处理器 801进行处理。此时,显示屏805还可以用于提供虚拟按钮和/或虚拟键盘,也称软按钮和/或软键盘。在一些实施例中,显示屏805可以为一个,设置在终端800的前面板;在另一些实施例中,显示屏805可以为至少两个,分别设置在终端800的不同表面或呈折叠设计;在另一些实施例中,显示屏805可以是柔性显示屏,设置在终端800的弯曲表面上或折叠面上。甚至,显示屏805还可以设置成非矩形的不规则图形,也即异形屏。显示屏805可以采用液晶显示屏(LCD,Liquid Crystal Display)、有机发光二极管(OLED,Organic Light-Emitting Diode)等材质制备。
摄像头组件806用于采集图像或视频。作为示例,摄像头组件806包括前置摄像头和后置摄像头。通常,前置摄像头设置在终端的前面板,后置摄像头设置在终端的背面。在一些实施例中,后置摄像头为至少两个,分别为主摄像头、景深摄像头、广角摄像头、长焦摄像头中的任意一种,以实现主摄像头和景深摄像头融合实现背景虚化功能、主摄像头和广角摄像头融合实现全景拍摄以及虚拟现实(VR,Virtual Reality)拍摄功能或者其它融合拍摄功能。在一些实施例中,摄像头组件806还可以包括闪光灯。闪光灯可以是单色温闪光灯,也可以是双色温闪光灯。双色温闪光灯是指暖光闪光灯和冷光闪光灯的组合,可以用于不同色温下的光线补偿。
音频电路807可以包括麦克风和扬声器。麦克风用于采集用户及环境的声波,并将声波转换为电信号输入至处理器801进行处理,或者输入至射频电路804以实现语音通信。出于立体声采集或降噪的目的,麦克风可以为多个,分别设置在终端800的不同部位。麦克风还可以是阵列麦克风或全向采集型麦克风。扬声器则用于将来自处理器801或射频电路804的电信号转换为声波。扬声器可以是传统的薄膜扬声器,也可以是压电陶瓷扬声器。当扬声器是压电陶瓷扬声器时,不仅可以将电信号转换为人类可听见的声波,也可以将电信号转换为人类听不见的声波以进行测距等用途。在一些实施例中,音频电路807还可以包括耳机插孔。
定位组件808用于定位终端800的当前地理位置,以实现导航或基于位置的服务(LBS,Location Based Service)。定位组件808可以是基于美国的全球定位系统(GPS,Global Positioning System)、中国的北斗系统或俄罗斯的伽利略系统的定位组件。
电源809用于为终端800中的各个组件进行供电。电源809可以是交流电、直流电、一次性电池或可充电电池。当电源809包括可充电电池时,该可充电电池可以是有线充电电池或无线充电电池。有线充电电池是通过有线线路充电的电池,无线充电电池是通过无线线圈充电的电池。该可充电电池还可以用于支持快充技术。
在一些实施例中,终端800还包括有一个或多个传感器810。该一个或多个传感器810包括但不限于:加速度传感器811、陀螺仪传感器812、压力传感器813、指纹传感器814、光学传感器815以及接近传感器816。
加速度传感器811可以检测以终端800建立的坐标系的三个坐标轴上的加速度大小。比如,加速度传感器811可以用于检测重力加速度在三个坐标轴上的分量。处理器801可以根据加速度传感器811采集的重力加速度信号,控制显示屏805以横向视图或纵向视图进行用户界面的显示。加速度传感器811还可以用于游戏或者用户的运动数据的采集。
陀螺仪传感器812可以检测终端800的机体方向及转动角度,陀螺仪传感器812可以与加速度传感器811协同采集用户对终端800的3D动作。处理器801根据陀螺仪传感器812采集的数据,可以实现如下功能:动作感应(比如根据用户的倾斜操作来改变UI)、拍摄时的 图像稳定、游戏控制以及惯性导航。
压力传感器813可以设置在终端800的侧边框和/或显示屏805的下层。当压力传感器813设置在终端800的侧边框时,可以检测用户对终端800的握持信号,由处理器801根据压力传感器813采集的握持信号进行左右手识别或快捷操作。当压力传感器813设置在显示屏805的下层时,由处理器801根据用户对显示屏805的压力操作,实现对UI界面上的可操作性控件进行控制。可操作性控件包括按钮控件、滚动条控件、图标控件、菜单控件中的至少一种。
指纹传感器814用于采集用户的指纹,由处理器801根据指纹传感器814采集到的指纹识别用户的身份,或者,由指纹传感器814根据采集到的指纹识别用户的身份。在识别出用户的身份为可信身份时,由处理器801授权该用户执行相关的敏感操作,该敏感操作包括解锁屏幕、查看加密信息、下载软件、支付及更改设置等。指纹传感器814可以被设置在终端800的正面、背面或侧面。当终端800上设置有物理按键或厂商Logo时,指纹传感器814可以与物理按键或厂商Logo集成在一起。
光学传感器815用于采集环境光强度。在一个实施例中,处理器801可以根据光学传感器815采集的环境光强度,控制显示屏805的显示亮度。具体地,当环境光强度较高时,调高显示屏805的显示亮度;当环境光强度较低时,调低显示屏805的显示亮度。在另一个实施例中,处理器801还可以根据光学传感器815采集的环境光强度,动态调整摄像头组件806的拍摄参数。
接近传感器816,也称距离传感器,通常设置在终端800的前面板。接近传感器816用于采集用户与终端800的正面之间的距离。在一个实施例中,当接近传感器816检测到用户与终端800的正面之间的距离逐渐变小时,由处理器801控制显示屏805从亮屏状态切换为息屏状态;当接近传感器816检测到用户与终端800的正面之间的距离逐渐变大时,由处理器801控制显示屏805从息屏状态切换为亮屏状态。
本领域技术人员可以理解,图8中示出的结构并不构成对终端800的限定,可以包括比图示更多或更少的组件,或者组合某些组件,或者采用不同的组件布置。
上述方法实施例中的电子设备能够实现为服务器。例如,图9是本申请实施例提供的一种服务器的结构示意图,该服务器900可因配置或性能不同而产生比较大的差异,能够包括一个或一个以上处理器901和一个或一个以上的存储器902,其中,该存储器902中存储有至少一条计算机程序,该至少一条计算机程序由该处理器901加载并执行以实现上述各个方法实施例提供的图像处理方法。当然,该服务器还能够具有有线或无线网络接口以及输入输出接口等部件,以便进行输入输出,该服务器还能够包括其他用于实现设备功能的部件,在此不做赘述。
在示例性实施例中,还提供了一种计算机可读存储介质,例如包括至少一条计算机程序的存储器,上述至少一条计算机程序由可由处理器执行以完成上述实施例中的图像处理方法。例如,计算机可读存储介质能够是只读存储器(ROM,Read-OnlyMemory)、随机存取存储器(RAM,RandomAccessMemory)、只读光盘(CD-ROM,CompactDiscRead-OnlyMemory)、磁带、软盘和光数据存储设备等。
在示例性实施例中,还提供了一种计算机程序产品或计算机程序,该计算机程序产品或该计算机程序包括一条或多条程序代码,该一条或多条程序代码存储在计算机可读存储介质 中。电子设备的一个或多个处理器能够从计算机可读存储介质中读取该一条或多条程序代码,该一个或多个处理器执行该一条或多条程序代码,使得电子设备能够执行上述图像处理方法。
应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
应理解,根据A确定B并不意味着仅仅根据A确定B,还能够根据A和/或其它信息确定B。
本领域普通技术人员能够理解实现上述实施例的全部或部分步骤能够通过硬件来完成,也能够通过程序来指令相关的硬件完成,该程序能够存储于一种计算机可读存储介质中,上述提到的存储介质能够是只读存储器,磁盘或光盘等。
以上描述仅为本申请的可选实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (18)

  1. 一种图像处理方法,所述方法由电子设备执行,所述方法包括:
    通过图像分类模型,对至少两个第一图像和至少两个第二图像进行分类处理,得到至少两个第一图像和至少两个第二图像的预测分类结果,所述第一图像携带有对应的分类标签,所述第二图像携带有对应的伪分类标签,所述第一图像和所述第二图像为包括待识别对象的图像,所述预测分类结果、所述分类标签和所述伪分类标签用于指示所述待识别对象的属性类型;
    基于所述至少两个第一图像的预测分类结果以及对应的分类标签,以及所述至少两个第二图像的预测分类结果以及对应的伪分类标签,获取第一损失值,所述第一损失值用于指示预测分类结果的准确度;
    基于所述至少两个第二图像的预测分类结果以及对应的伪分类标签,获取第二损失值,所述第二损失值用于指示所述伪分类标签的准确度;
    基于所述第一损失值和所述第二损失值,对所述图像分类模型的模型参数进行更新,基于所述至少两个第二图像的预测分类结果对对应的伪分类标签进行更新后继续执行分类处理、以及所述第一损失值和所述第二损失值的获取处理,直至当符合目标条件时得到目标图像分类模型。
  2. 根据权利要求1所述的方法,其中,所述至少两个第二图像的预测分类结果以及对应的伪分类标签,获取第二损失值,包括:
    获取所述至少两个第二图像的预测分类结果以及对应的伪分类标签之间的误差;
    根据所述至少两个第二图像的权值,对所述至少两个第二图像对应的误差进行加权,得到所述第二损失值。
  3. 根据权利要求2所述的方法,其中,所述根据所述至少两个第二图像的权值,对所述至少两个第二图像对应的误差进行加权,得到所述第二损失值,包括:
    对所述至少两个第一图像和所述至少两个第二图像进行聚类,得到聚类结果,其中,所述聚类结果包括至少两个聚类中心以及对应每个所述聚类中心的图像;
    根据每个所述第二图像与所属聚类中心之间的距离,获取每个所述第二图像的权值。
  4. 根据权利要求3所述的方法,其中,所述根据每个所述第二图像与所属聚类中心之间的距离,获取每个所述第二图像的权值,包括:
    根据每个所述第二图像与所属聚类中心之间的距离,获取每个所述第二图像对应的概率,所述概率为所述伪分类标签为正确分类标签的概率;
    基于每个所述第二图像对应的概率,获取每个所述第二图像的权值。
  5. 根据权利要求4所述的方法,其中,所述基于所述至少两个图像对应的概率,获取所述至少两个图像的权值,包括:
    响应于任一第二图像对应的所述概率大于或等于概率阈值,获取所述概率作为所述第二图像的权值;
    响应于任一第二图像对应的所述概率小于所述概率阈值,获取零值作为所述第二图像的权值。
  6. 根据权利要求3所述的方法,其中,所述基于所述至少两个第二图像的预测分类结果对对应的伪分类标签进行更新,包括:
    根据所述至少两个第二图像的预测分类结果和所述聚类结果,获取每个所述第二图像的预测分类结果对应的伪分类标签。
  7. 根据权利要求1所述的方法,其中,所述基于所述至少两个第二图像的预测分类结果以及对应的伪分类标签,获取第二损失值,包括:
    基于混合高斯模型,对所述至少两个第二图像的预测分类结果以及对应的伪分类标签进行处理,得到所述第二损失值;
    所述方法还包括:
    基于所述第二损失值,对所述高斯混合模型的模型参数进行更新。
  8. 根据权利要求1所述的方法,其中,所述方法还包括:
    根据所述至少两个第一图像和所述至少两个第二图像,以及对应的预测分类结果,获取所述至少两个第一图像和所述至少两个第二图像的预测类型,所述预测类型用于指示所述第一图像为所述第一图像或所述第二图像,并用于指示所述第二图像为所述第一图像或所述第二图像;
    根据所述至少两个第一图像和所述至少两个第二图像的预测类型,获取第三损失值,所述第三损失值用于指示所述预测类型的准确度;
    所述基于所述第一损失值和所述第二损失值,对所述图像分类模型的模型参数进行更新,包括:
    基于所述第一损失值、所述第二损失值以及所述第三损失值,对所述图像分类模型的模型参数进行更新。
  9. 根据权利要求8所述的方法,其中,所述根据所述至少两个第一图像和所述至少两个第二图像,以及对应的预测分类结果,获取所述至少两个第一图像和所述至少两个第二图像的预测类型,包括:
    基于判别网络,对所述至少两个第一图像和至少两个第二图像以及对应的预测分类结果进行判别处理,得到所述至少两个第一图像和所述至少两个第二图像的预测类型;
    所述方法还包括:
    根据所述第三损失值,对所述判别网络的网络参数进行更新。
  10. 根据权利要求1所述的方法,其中,所述方法还包括:
    响应于图像处理指令,获取第三图像,所述第三图像为所述医学图像;
    将所述第三图像输入所述目标图像分类模型中,由所述目标图像分类模型对所述第三图像进行特征提取,基于提取到的图像特征对所述第三图像进行分类,得到所述第三图像中属性类型。
  11. 根据权利要求1所述的方法,其中,在终端基于图像分类模型,获取至少两个第一图像和至少两个第二图像的预测分类结果之前,所述方法还包括:
    获取所述至少两个第一图像和所述至少两个第二图像,所述第二图像不具有对应的分类标签,且不具有对应的伪分类标签;
    基于初始模型,获取所述至少两个第一图像和所述至少两个第二图像的初始预测分类结果;
    基于所述第一图像携带的分类标签与所述第一图像的初始预测分类结果,获取初始的第一损失值;
    基于所述初始的第一损失值,对所述初始模型的模型参数进行更新;
    基于所述至少两个第二图像的初始预测分类结果,获取至少两个第二图像对应的伪分类标签。
  12. 根据权利要求1所述的方法,其中,所述继续执行分类处理和所述第一损失值和所述第二损失值的获取处理,直至当符合目标条件时得到目标图像分类模型,包括:
    当不满所述目标条件时,执行以下处理:
    基于所述至少两个第二图像的预测分类结果对对应的伪分类标签进行更新;
    通过更新所述模型参数得到的图像分类模型,对所述至少两个第一图像和所述至少两个第二图像进行分类处理,得到所述至少两个第一图像和所述至少两个第二图像的更新预测分类结果;
    基于所述至少两个第一图像和所述至少两个第二图像的更新预测分类结果、所述至少两个第一图像对应的分类标签、以及所述至少两个第二图像对应的更新后的伪分类标签,对更新得到的图像分类模型的模型参数进行再次更新;
    当满足所述目标条件时,将更新所述模型参数得到的图像分类模型确定为所述目标图像分类模型。
  13. 一种图像处理装置,所述装置包括:
    获取模块,配置为通过图像分类模型,对至少两个第一图像和至少两个第二图像进行分类处理,得到至少两个第一图像和至少两个第二图像的预测分类结果,所述第一图像携带有对应的分类标签,所述第二图像携带有对应的伪分类标签,所述第一图像和所述第二图像为包括待识别对象的图像,所述预测分类结果、所述分类标签和所述伪分类标签用于指示所述待识别对象的属性类型;
    所述获取模块,还配置为基于所述至少两个第一图像的预测分类结果以及对应的分类标签,以及所述至少两个第二图像的预测分类结果以及对应的伪分类标签,获取第一损失值,所述第一损失值用于指示预测分类结果的准确度;
    所述获取模块,还配置为基于所述至少两个第二图像的预测分类结果以及对应的伪分类标签,获取第二损失值,所述第二损失值用于指示所述伪分类标签的准确度;
    更新模块,配置为基于所述第一损失值和所述第二损失值,对所述图像分类模型的模型参数进行更新,基于所述至少两个第二图像的预测分类结果对对应的伪分类标签进行更新后继续执行分类处理和所述第一损失值和所述第二损失值的获取处理,直至当符合目标条件时得到目标图像分类模型直至当符合目标条件时。
  14. 根据权利要求13所述的装置,其中,所述获取模块用于:
    获取所述至少两个第二图像的预测分类结果以及对应的伪分类标签之间的误差;
    根据所述至少两个第二图像的权值,对所述至少两个第二图像对应的误差进行加权,得到所述第二损失值。
  15. 根据权利要求14所述的装置,其中,所述获取模块用于:
    对所述至少两个第一图像和所述至少两个第二图像进行聚类,得到聚类结果,其中,所述聚类结果包括至少两个聚类中心以及对应每个所述聚类中心的图像;
    根据每个所述第二图像与所属聚类中心之间的距离,获取每个所述第二图像的权值。
  16. 一种电子设备,所述电子设备包括一个或多个处理器和一个或多个存储器,所述一个 或多个存储器中存储有至少一条计算机程序,所述至少一条计算机程序由所述一个或多个处理器加载并执行以实现如权利要求1至权利要求12任一项所述的图像处理方法。
  17. 一种计算机可读存储介质,所述存储介质中存储有至少一条计算机程序,所述至少一条计算机程序由处理器加载并执行以实现如权利要求1至权利要求12任一项所述的图像处理方法。
  18. 一种计算机程序产品,包括计算机程序或指令,所述计算机程序或指令被处理器执行时,实现权利要求1至12任一项所述的图像处理方法。
PCT/CN2022/079496 2021-03-17 2022-03-07 图像处理方法、装置、电子设备、计算机可读存储介质及计算机程序产品 WO2022193973A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/071,106 US20230097391A1 (en) 2021-03-17 2022-11-29 Image processing method and apparatus, electronic device, computer-readable storage medium, and computer program product

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110286366.X 2021-03-17
CN202110286366.XA CN113724189A (zh) 2021-03-17 2021-03-17 图像处理方法、装置、设备及存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/071,106 Continuation US20230097391A1 (en) 2021-03-17 2022-11-29 Image processing method and apparatus, electronic device, computer-readable storage medium, and computer program product

Publications (1)

Publication Number Publication Date
WO2022193973A1 true WO2022193973A1 (zh) 2022-09-22

Family

ID=78672576

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/079496 WO2022193973A1 (zh) 2021-03-17 2022-03-07 图像处理方法、装置、电子设备、计算机可读存储介质及计算机程序产品

Country Status (3)

Country Link
US (1) US20230097391A1 (zh)
CN (1) CN113724189A (zh)
WO (1) WO2022193973A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113724189A (zh) * 2021-03-17 2021-11-30 腾讯科技(深圳)有限公司 图像处理方法、装置、设备及存储介质
CN116109877B (zh) * 2023-04-07 2023-06-20 中国科学技术大学 组合式零样本图像分类方法、系统、设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108764281A (zh) * 2018-04-18 2018-11-06 华南理工大学 一种基于半监督自步学习跨任务深度网络的图像分类方法
US20200042832A1 (en) * 2019-09-09 2020-02-06 Lg Electronics Inc. Artificial intelligence apparatus and method for updating artificial intelligence model
CN111222648A (zh) * 2020-01-15 2020-06-02 深圳前海微众银行股份有限公司 半监督机器学习优化方法、装置、设备及存储介质
CN111709315A (zh) * 2020-05-27 2020-09-25 西安交通大学 一种基于领域适配的水声目标辐射噪声识别方法
CN112131961A (zh) * 2020-08-28 2020-12-25 中国海洋大学 一种基于单样本的半监督行人重识别方法
CN113724189A (zh) * 2021-03-17 2021-11-30 腾讯科技(深圳)有限公司 图像处理方法、装置、设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108764281A (zh) * 2018-04-18 2018-11-06 华南理工大学 一种基于半监督自步学习跨任务深度网络的图像分类方法
US20200042832A1 (en) * 2019-09-09 2020-02-06 Lg Electronics Inc. Artificial intelligence apparatus and method for updating artificial intelligence model
CN111222648A (zh) * 2020-01-15 2020-06-02 深圳前海微众银行股份有限公司 半监督机器学习优化方法、装置、设备及存储介质
CN111709315A (zh) * 2020-05-27 2020-09-25 西安交通大学 一种基于领域适配的水声目标辐射噪声识别方法
CN112131961A (zh) * 2020-08-28 2020-12-25 中国海洋大学 一种基于单样本的半监督行人重识别方法
CN113724189A (zh) * 2021-03-17 2021-11-30 腾讯科技(深圳)有限公司 图像处理方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN113724189A (zh) 2021-11-30
US20230097391A1 (en) 2023-03-30

Similar Documents

Publication Publication Date Title
CN111298445B (zh) 目标账号检测方法、装置、电子设备及存储介质
CN111325726A (zh) 模型训练方法、图像处理方法、装置、设备及存储介质
CN111243668B (zh) 分子结合位点检测方法、装置、电子设备及存储介质
WO2022193973A1 (zh) 图像处理方法、装置、电子设备、计算机可读存储介质及计算机程序产品
CN111598160B (zh) 图像分类模型的训练方法、装置、计算机设备及存储介质
CN111930964B (zh) 内容处理方法、装置、设备及存储介质
CN112733970B (zh) 图像分类模型处理方法、图像分类方法及装置
CN111104980B (zh) 确定分类结果的方法、装置、设备及存储介质
CN112749728A (zh) 学生模型训练方法、装置、计算机设备及存储介质
CN111091166A (zh) 图像处理模型训练方法、图像处理方法、设备及存储介质
CN113392180A (zh) 文本处理方法、装置、设备及存储介质
CN111598896B (zh) 图像检测方法、装置、设备及存储介质
CN114722937A (zh) 一种异常数据检测方法、装置、电子设备和存储介质
CN111914180A (zh) 基于图结构的用户特征确定方法、装置、设备及介质
CN113257412B (zh) 信息处理方法、装置、计算机设备及存储介质
WO2022095640A1 (zh) 对图像中的树状组织进行重建的方法、设备及存储介质
CN113505256B (zh) 特征提取网络训练方法、图像处理方法及装置
CN113674856A (zh) 基于人工智能的医学数据处理方法、装置、设备及介质
CN112163095A (zh) 数据处理方法、装置、设备及存储介质
CN111353513B (zh) 一种目标人群筛选的方法、装置、终端和存储介质
CN112988984B (zh) 特征获取方法、装置、计算机设备及存储介质
CN113762585B (zh) 数据的处理方法、账号类型的识别方法及装置
CN114328948A (zh) 文本标准化模型的训练方法、文本标准化方法及装置
CN114281937A (zh) 嵌套实体识别模型的训练方法、嵌套实体识别方法及装置
CN114333997A (zh) 数据处理、数据处理模型的训练方法、装置、设备及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22770328

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE