CN116977775A - Image processing and model training method, device, equipment and storage medium - Google Patents

Image processing and model training method, device, equipment and storage medium Download PDF

Info

Publication number
CN116977775A
CN116977775A CN202310545122.8A CN202310545122A CN116977775A CN 116977775 A CN116977775 A CN 116977775A CN 202310545122 A CN202310545122 A CN 202310545122A CN 116977775 A CN116977775 A CN 116977775A
Authority
CN
China
Prior art keywords
image processing
training
processing model
image
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310545122.8A
Other languages
Chinese (zh)
Inventor
张博深
王昌安
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202310545122.8A priority Critical patent/CN116977775A/en
Publication of CN116977775A publication Critical patent/CN116977775A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0004Industrial image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The application provides an image processing and model training method, a device, equipment and a storage medium, wherein the method can be applied to the technical fields of artificial intelligence, industrial quality inspection, image processing and the like and comprises the following steps: respectively extracting features of the training image through a first image processing model and a second image processing model to obtain first feature information and second feature information of the training image; c classification feature prototypes in the current training stage are determined, and first similarity between the first feature information and the C classification feature prototypes and second similarity between the second feature information and the C classification feature prototypes are determined; and training the first image processing model and the second image processing model by taking the first similarity as a pseudo tag of the second image processing model and the second similarity as a pseudo tag of the first image processing model. The training is supervised through the interaction information, so that the interference of noise labels is overcome, the robustness of the model is enhanced, and the image processing effect is improved.

Description

Image processing and model training method, device, equipment and storage medium
Technical Field
The embodiment of the application relates to the technical field of image processing, in particular to an image processing and model training method, device and equipment and a storage medium.
Background
With the rapid development of deep learning technology, the deep learning technology is widely applied in the field of image processing, for example, in an industrial quality inspection scene, an artificial intelligence (Artificial Intelligence, AI) quality inspection technology is used for quality inspection of industrial products in the production and manufacturing process. In the industrial production and manufacturing process, an industrial product image is acquired, the acquired industrial product image is input into an image processing model for defect detection, whether the industrial product has defects or not and what type of defects exist are detected, so that the quality inspection accuracy can be improved, and the labor cost is saved.
Wherein the image processing effect of the image processing model is closely related to the training phase of the image processing model. At present, in the training process of an image processing model, a manual labeling mode is generally adopted to determine a training label of the model. However, the training labels marked by people are not accurate enough, so that the training performance of the model can be influenced, and the image processing effect of the model is reduced.
Disclosure of Invention
The application provides an image processing and model training method, an image processing and model training device, image processing equipment and a storage medium, which can improve the training performance of an image processing model and further improve the image processing effect.
In a first aspect, the present application provides an image processing model training method, including:
respectively extracting features of a training image through a first image processing model and a second image processing model to obtain first feature information and second feature information of the training image;
c classification feature prototypes in the current training stage are determined, wherein the C classification feature prototypes are obtained by clustering feature information of each training image in a training image set, and C is a positive integer;
determining a first similarity between the first feature information and each of the C classification feature prototypes and a second similarity between the second feature information and each of the C classification feature prototypes;
and training the first image processing model and the second image processing model by taking the first similarity as a pseudo tag of the second image processing model and the second similarity as a pseudo tag of the first image processing model.
In a second aspect, the present application provides an image processing method, including:
acquiring a target image to be processed;
processing the target image through a target model to obtain a processing result of the target image;
The target model is a first image processing model or a second image processing model, the first image processing model and the second image processing model are obtained by taking a first similarity as a pseudo tag of the second image processing model and a second similarity as a pseudo tag of the first image processing model through training, the first similarity is the similarity between the first feature information and C classification feature prototypes in the current training stage, the second similarity is the similarity between the second feature information and the C classification feature prototypes, the first feature information and the second feature information are obtained by respectively extracting features of training images based on the first image processing model and the second image processing model, the C classification feature prototypes are obtained by clustering feature information of each training image in a training image set, and the C is a positive integer.
In a third aspect, the present application provides an image processing model training apparatus, including:
the feature extraction unit is used for extracting features of the training image through the first image processing model and the second image processing model respectively to obtain first feature information and second feature information of the training image;
The prototype determining unit is used for determining C classification feature prototypes in the current training stage, wherein the C classification feature prototypes are obtained by clustering feature information of each training image in a training image set, and C is a positive integer;
a similarity determining unit configured to determine a first similarity between the first feature information and each of the C classification feature prototypes, and a second similarity between the second feature information and each of the C classification feature prototypes;
the training unit is used for taking the first similarity as a pseudo tag of the second image processing model, taking the second similarity as a pseudo tag of the first image processing model, and training the first image processing model and the second image processing model.
In a fourth aspect, the present application provides an image processing apparatus comprising:
an acquisition unit configured to acquire a target image to be processed;
the processing unit is used for processing the target image through a target model to obtain a processing result of the target image;
the target model is a first image processing model or a second image processing model, the first image processing model and the second image processing model are obtained by taking a first similarity as a pseudo tag of the second image processing model and a second similarity as a pseudo tag of the first image processing model through training, the first similarity is the similarity between the first feature information and C classification feature prototypes in the current training stage, the second similarity is the similarity between the second feature information and the C classification feature prototypes, the first feature information and the second feature information are obtained by respectively extracting features of training images based on the first image processing model and the second image processing model, the C classification feature prototypes are obtained by clustering feature information of each training image in a training image set, and the C is a positive integer.
In a fifth aspect, an encoder is provided that includes a processor and a memory. The memory is configured to store a computer program, and the processor is configured to invoke and execute the computer program stored in the memory, so as to perform the method in the first aspect or each implementation manner thereof.
In a sixth aspect, a decoder is provided that includes a processor and a memory. The memory is used for storing a computer program, and the processor is used for calling and running the computer program stored in the memory to execute the method in the second aspect or various implementation manners thereof.
A seventh aspect provides a chip for implementing the method of any one of the first to second aspects or each implementation thereof. Specifically, the chip includes: a processor for calling and running a computer program from a memory, causing a device on which the chip is mounted to perform the method as in any one of the first to second aspects or implementations thereof described above.
In an eighth aspect, a computer-readable storage medium is provided for storing a computer program, the computer program causing a computer to perform the method of any one of the first to second aspects or each implementation thereof.
A ninth aspect provides a computer program product comprising computer program instructions for causing a computer to perform the method of any one of the first to second aspects or implementations thereof.
In a tenth aspect, there is provided a computer program which, when run on a computer, causes the computer to perform the method of any one of the first to second aspects or implementations thereof.
In summary, when training the image processing models, the embodiment of the application first initializes two image processing models, namely a first image processing model and a second image processing model. And respectively extracting the characteristics of the training image through the first image processing model and the second image processing model to obtain first characteristic information and second characteristic information of the training image. And then, C classification feature prototypes in the current training stage are determined, wherein the C classification feature prototypes are obtained by clustering feature information of each training image in the training image set. And further determining a first degree of similarity between the first feature information and each of the C classification feature prototypes, and determining a second degree of similarity between the second feature information and each of the C classification feature prototypes. The first similarity is used as a pseudo tag of the second image processing model, the second similarity is used as a pseudo tag of the first image processing model, and the first image processing model and the second image processing model are trained. That is, in the embodiment of the present application, the first similarity determined based on the first image processing model is used as the pseudo tag of the second image processing model, and the second similarity determined based on the second image processing model is used as the pseudo tag of the first image processing model, so that training is supervised by interaction information to overcome the interference of noise tags, improve the training effect of the first image processing model and the second image processing model, and enhance the robustness of the models. In the subsequent actual prediction stage, any one image processing model of the first image processing model and the second image processing model can be selected for prediction, and the image processing effect is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of an industrial defect;
fig. 2 is a schematic diagram of an application scenario according to an embodiment of the present application;
FIG. 3 is a flowchart of an image processing model training method according to an embodiment of the present application;
FIGS. 4 to 9 are schematic diagrams illustrating training process data processing according to embodiments of the present application;
FIG. 10 is a flowchart of an image processing model training method according to an embodiment of the present application;
FIG. 11 is a flowchart of an image processing method according to an embodiment of the present application;
FIG. 12 is a schematic diagram of an image processing process;
FIG. 13A is a schematic diagram of a predicted defect image;
FIG. 13B is a schematic diagram of a predicted non-defective image;
FIG. 14 is a schematic block diagram of an image processing model training apparatus provided in an embodiment of the present application;
Fig. 15 is a schematic block diagram of an image processing apparatus provided by an embodiment of the present application;
FIG. 16 is a schematic block diagram of a computing device provided by an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the application described herein may be capable of operation in sequences other than those illustrated or otherwise described. In the embodiment of the application, "B corresponding to A" means that B is associated with A. In one implementation, B may be determined from a. It should also be understood that determining B from a does not mean determining B from a alone, but may also determine B from a and/or other information. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. In the description of the present application, unless otherwise indicated, "a plurality" means two or more than two.
The technical scheme provided by the application can be applied to the technical fields of artificial intelligence, industrial quality inspection, image processing and the like, and is used for improving the training effect of the image processing model, so that the processing effect of the image can be improved when the image processing model after efficient training is used for carrying out subsequent image processing, for example, the accuracy of industrial quality inspection is improved.
Related concepts related to the embodiments of the present application are described below.
Artificial intelligence (Artificial Intelligence, AI), is a theory, method, technique, and application system that simulates, extends, and extends human intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, obtains knowledge, and uses the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.
The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
Machine Learning (ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.
With research and advancement of artificial intelligence technology, research and application of artificial intelligence technology is being developed in various fields, such as common smart home, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned, automatic driving, unmanned aerial vehicles, robots, smart medical treatment, smart customer service, etc., and it is believed that with the development of technology, artificial intelligence technology will be applied in more fields and with increasing importance value.
The quality inspection of industrial defects refers to quality inspection of industrial products in the production and manufacturing processes.
The traditional industrial quality inspection is generally performed by quality inspection workers, and with the rising of AI technology in recent years, the AI quality inspection based on machine vision can greatly improve the quality inspection accuracy and save the labor cost. The input of this technique is a picture taken of the surface of the industrial product, the output being the confidence of the defect. The intelligent AI industrial quality inspection has wide market application prospect.
In some embodiments, the machine vision-based industrial defect quality inspection algorithm performs manual feature extraction on the input image, including gradient features, texture features, and the like, and then trains an SVM classifier (or a tree-based classifier, such as a random forest) to classify the defect image of the current picture according to the extracted manual features. The method has two problems, namely, the extracted manual features are relatively poor in generalization, and some harmful features which are confused with the subsequent classifier are often extracted for diversified video data; and secondly, the feature extraction and the training of the classifier are independently carried out, and the training cost of the model is relatively high. CNNs, which have emerged in recent years, offer a good solution to both of these problems. According to the CNN-based quality inspection method, an original image to be inspected is directly sent into a CNN network structure to conduct feature extraction, then a full-connection layer is used for classification, model training is conducted according to a softmax loss function, end-to-end training is conducted, manual feature design is not needed in the whole step, the most suitable feature for the current classification task can be automatically learned during model training, and the whole process is end-to-end, and independent training of two steps of feature extraction and classification is not needed.
In the current whole graph classification method based on CNN, a manual labeling method is adopted in the training process of the model to determine the training label of the model. However, manually labeled training labels have the problem of inaccuracy. For example, in the field of quality inspection of industrial defects, quality inspection of industrial defects is not a simple multi-classification task, as shown in fig. 1, the degree of many defect images is slight and can be classified as an OK image, and the problem that marking errors are caused by high similarity may exist between different defects, at this time, an artificial label may have many subjectivity, further, such defect artificial marking information is caused to have noise, and training of a model in such noisy label may affect the performance of the model, further, the image processing effect is reduced, for example, the accuracy of industrial quality inspection is reduced.
In order to solve the above technical problems, in the embodiment of the present application, when training the image processing models, two image processing models, namely, a first image processing model and a second image processing model, are initialized first. And respectively extracting the characteristics of the training image through the first image processing model and the second image processing model to obtain first characteristic information and second characteristic information of the training image. And then, C classification feature prototypes in the current training stage are determined, wherein the C classification feature prototypes are obtained by clustering feature information of each training image in the training image set. And further determining a first degree of similarity between the first feature information and each of the C classification feature prototypes, and determining a second degree of similarity between the second feature information and each of the C classification feature prototypes. The first similarity is used as a pseudo tag of the second image processing model, the second similarity is used as a pseudo tag of the first image processing model, and the first image processing model and the second image processing model are trained. That is, in the embodiment of the present application, the first similarity determined based on the first image processing model is used as the pseudo tag of the second image processing model, and the second similarity determined based on the second image processing model is used as the pseudo tag of the first image processing model, so that training is supervised by interaction information to overcome the interference of noise tags, improve the training effect of the first image processing model and the second image processing model, and enhance the robustness of the models. In the subsequent actual prediction stage, any one image processing model of the first image processing model and the second image processing model can be selected for prediction, and the image processing effect is improved.
The application scenario of the embodiment of the present application is described below.
Fig. 2 is a schematic diagram of an application scenario according to an embodiment of the present application, including a terminal device 101 and a computing device 102.
As shown in fig. 2, the computing device 102 of an embodiment of the present application includes a first image processing model and a second image processing model. Wherein the terminal device 101 obtains a training image set and transmits the training image set to the computing device 102. The computing device 102 uses the training image set to train the first image processing model and the second image processing model by adopting the image processing model training method provided by the embodiment of the application.
The computing device 102, for example, obtains a first image processing model including a first feature extraction module and a second image processing model including a second feature extraction module. Next, the computing device 102 acquires a training image set based on the instruction of the terminal device 101. Next, the computing device 102 performs feature extraction on the training image through the first image processing model and the second image processing model, so as to obtain first feature information and second feature information of the training image. The computing device 102 then determines C classification feature prototypes for the current training phase, the C classification feature prototypes being obtained by clustering feature information for each training image in the training image set. The computing device 102, in turn, determines a first degree of similarity between the first feature information and each of the C classification feature prototypes, and determines a second degree of similarity between the second feature information and each of the C classification feature prototypes. The computing device 102 thus trains the first image processing model and the second image processing model with the first similarity as a pseudo tag for the second image processing model and the second similarity as a pseudo tag for the first image processing model. That is, in the embodiment of the present application, the computing device 102 uses the first similarity between the first feature information of the training image and the C kinds of classification feature prototypes as the pseudo tag of the second image processing model, uses the second similarity between the second feature information of the training image and the C kinds of classification feature prototypes as the pseudo tag of the first image processing model, and monitors the training through the interaction information to overcome the interference of the noise tag, so as to improve the training effect of the first image processing model and the second image processing model, and enhance the robustness of the model. In the subsequent actual prediction stage, any one image processing model of the first image processing model and the second image processing model can be selected for prediction, and the image processing effect is improved.
In some embodiments, as shown in fig. 2, the application scenario further includes a database 103, where the database 103 includes historical image data, such as historical defect image data. In the embodiment of the present application, the terminal device 101 is communicatively connected to the database 103, and may write data into the database 103, and the computing device 102 is also communicatively connected to the database 102, and may read data from the database 103. In one example, in a model training process of an embodiment of the present application, the computing device 102 obtains historical image data from the database 103 as a training image set of the first image processing model and the second image processing model while training the models. The computing device 102 then trains the first image processing model and the second image processing model using the training images in the training image set.
In some embodiments, the computing device 102 stores the trained first image processing model and second image processing model in the computing device 102. In this way, in an actual application process, for example, an actual image processing process, the terminal device 101 acquires target data to be processed, and sends the target data to the computing device 102, so that the computing device 102 processes the target data using the saved first image processing model or the second image processing model.
The embodiment of the present application does not limit the specific type of the terminal device 101. In some embodiments, terminal device 101 may include, but is not limited to: cell phones, computers, intelligent voice interaction devices, intelligent home appliances, vehicle terminals, aircrafts, wearable intelligent devices, medical devices, and the like. Devices are often configured with a display device, which may also be a display, a display screen, a touch screen, etc., as well as a touch screen, a touch panel, etc.
In some embodiments, computing device 102 is a terminal device having data processing functionality, such as a cell phone, a computer, a smart voice interaction device, a smart home appliance, an in-vehicle terminal, an aircraft, a wearable smart device, a medical device, and so forth.
In some embodiments, computing device 102 is a server. The server may be one or more. Where the servers are multiple, there are at least two servers for providing different services and/or there are at least two servers for providing the same service, such as in a load balancing manner, as embodiments of the application are not limited in this respect. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs (Content Delivery Network, content delivery networks), basic cloud computing services such as big data and artificial intelligence platforms, and the like. Server 102 may also become a node of the blockchain.
In the embodiment of the present application, the terminal device 101 and the computing device 102 may be directly or indirectly connected through wired communication or wireless communication, which is not limited herein.
It should be noted that, the application scenario of the embodiment of the present application includes, but is not limited to, that shown in fig. 2.
The following describes the technical scheme of the embodiments of the present application in detail through some embodiments. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.
The model training process of the embodiment of the application is first described.
Fig. 3 is a flowchart of an image processing model training method according to an embodiment of the application. The execution subject of the embodiment of the application is a device with a model training function, for example, a model training device. In some embodiments, the model training apparatus may be the computing device in fig. 2, or the terminal device in fig. 2, or a system of the computing device and the terminal device in fig. 2. For ease of description, embodiments of the present application will be described with respect to a computing device as an example of an execution body.
As shown in fig. 3, the model training process of the embodiment of the present application includes:
S101, respectively extracting features of the training image through a first image processing model and a second image processing model to obtain first feature information and second feature information of the training image.
At present, in some cases, a training label of an image processing model has noise, for example, in the field of industrial application, label noise is commonly present, and noise samples can generate ambiguity on supervision information of the model in a training stage due to label errors, so that a prediction result of the model has large variation (namely poor consistency), and the robustness of the model is reduced.
In the embodiment of the application, in a model training stage, two image processing models are used for mutual supervision training, a first similarity determined based on a first image processing model is used as a pseudo tag of a second image processing model, and a second similarity determined based on the second image processing model is used as the pseudo tag of the first image processing model, so that the interference of noise tags is overcome, the training effect of the first image processing model and the second image processing model is improved, and the robustness of the model is enhanced. In the subsequent actual prediction stage, any one image processing model of the first image processing model and the second image processing model can be selected for prediction, and the image processing effect is improved.
In an embodiment of the application, the computing device first initializes two different models, a first image processing model and a second image processing model. Illustratively, the initialization stage employs a random initialization to ensure that there is an output difference between the first image processing model and the second image processing model.
The embodiment of the application does not limit the specific network structure of the first image processing model and the second image processing model.
In some embodiments, the network structure of the first image processing model and the second image processing model is the same, i.e. the first image processing model and the second image processing model are the same neural network model.
In some embodiments, the network structure of the first image processing model and the second image processing model are different, i.e. the first image processing model and the second image processing model are different neural network models.
In some embodiments, at least one of the first image processing model and the second image processing model is a CNN model or a transducer model. For example, the first image processing model and the second image processing model are both CNN models; or the first image processing model and the second image processing model are both a transducer model; or the first image processing model is a CNN model, and the second image processing model is a transducer model; or the second image processing model is a CNN model, and the first image processing model is a transducer model.
In an embodiment of the application, the computing device performs multiple rounds of training on the first image processing model and the second image processing model, and each round of training process is basically consistent. For example, during a first round of training, the computing device selects at least one training image from the training image set, trains the initial first image processing model and the initial second image processing model using the at least one training image to update parameters in the first image processing model and the second image processing model, resulting in a first round of updated first image processing model and second image processing model. Next, the computing device reselects at least one training image from the training image set, trains the first image processing model and the second image processing model after the first round of training using the reselected at least one training image to update parameters in the first image processing model and the second image processing model after the first round of training, and obtains a first image processing model and a second image processing model after the second round of updating. And so on until the model training end condition is reached. The model training end condition includes, for example, that the number of training times reaches the predicted number, or that the model loss reaches a preset loss.
At each round of training of the model, the computing device selects at least one training image in the training dataset to form a batch of training data for one round of training of the model. In the embodiment of the present application, the training process of each wheel of the model is basically consistent, and for convenience of description, the current wheel training will be described herein as an example.
The training image of the embodiment of the application can be understood as any training image of the current wheel training batch.
The embodiment of the application does not limit the specific types of the first image processing model and the second image processing model.
In some embodiments, the first image processing model and the second image processing model are industrial quality inspection models for detecting defect categories of industrial products, and each training image in the corresponding training image set is a defect image of an industrial product.
In some embodiments, the first image processing model and the second image processing model are classification models for detecting a class of an object in the images, and each training image in the corresponding training images is an image including a different object.
In some embodiments, the first image processing model and the second image processing model may also be used for other scenarios, as embodiments of the application are not limited in this respect.
In the embodiment of the application, after the computing device obtains the training image set and the first image processing model and the second image processing model, for any training image in the training image set, the training image is input into the first image processing model for feature extraction, and first feature information of the training image is obtained. And simultaneously, inputting the training image into a second image processing model for feature extraction to obtain second feature information of the training image.
In some embodiments, the first image processing model and the second image processing model each include a feature extraction module, and for ease of description, the feature extraction module in the first image processing model is denoted as a first feature extraction module and the feature extraction module in the second image processing model is denoted as a second feature extraction module, as shown in fig. 4. At this time, the computing device inputs the training image into a first feature extraction module of the first image processing model to perform feature extraction, so as to obtain first feature information of the training image. And simultaneously, inputting the training image into a second feature extraction module of a second image processing model to perform feature extraction, so as to obtain second feature information of the training image.
The embodiment of the application does not limit the specific network structure of the first feature extraction module and the second feature extraction module.
In some embodiments, the network structure of the first feature extraction module and the second feature extraction module are the same.
In some embodiments, the network structure of the first feature extraction module and the second feature extraction module are not identical.
In some embodiments, the first feature extraction module and the second feature extraction module include at least one convolution layer for extracting feature information of the input image data.
In some embodiments, the first characteristic information and the second characteristic information have the same scale, i.e. the first characteristic information and the second characteristic information have the same size and the same number of channels. For example, the first feature information and the second feature information are each n1Xn1 in size and c in number of channels.
In some embodiments, the dimensions of the first and second characteristic information may not be identical, e.g., the first and second characteristic information may not be uniform in size, and/or the number of channels may not be identical. That is, the sizes and the channel numbers of the first feature information and the second feature information are different, or the sizes of the first feature information and the second feature information are the same but the channel numbers are different, or the sizes of the first feature information and the second feature information are different but the channel numbers are the same.
In some embodiments, the first feature information and the second feature information may be represented in the form of a feature map or in the form of a matrix.
In some embodiments, the computing device first performs image enhancement prior to feature extraction of the training image by the first image processing model and the second image processing model, where S101 includes the following steps S101-A1 to S101-A3:
S101-A1, performing enhancement processing on a training image to obtain a first enhancement image and a second enhancement image;
S101-A2, extracting features of a first enhanced image through a first image processing model to obtain first feature information;
S101-A3, extracting features of the second enhanced image through the second image processing model to obtain second feature information.
In this embodiment, as shown in fig. 5, the computing device first performs image enhancement on the training image to generate two different enhanced images, so that when feature extraction is performed on the two different enhanced images, it can be ensured that the extracted first feature information and second feature information are different. And determining different first similarity and second similarity based on the different first characteristic information and second characteristic information, and finally realizing effective training of the first image processing model and the second image processing model based on the different first similarity and second similarity.
The mode of the enhancement processing of the training image is not limited in the embodiment of the application.
In some embodiments, the computing device enhances the training image using a first enhancement mode to obtain a first enhanced image, and enhances the training image using a second enhancement mode to obtain a second enhanced image. Wherein the first enhancement mode is different from the second enhancement mode.
Illustratively, the first enhancement includes at least one of rotation, translation, scaling, illumination transformation, gaussian noise, and the like.
Illustratively, the second enhancement mode includes at least one of rotation, translation, scaling, illumination transformation, gaussian noise, and the like.
That is, in the embodiment of the present application, the computing device randomly selects some enhancement operations from the preset enhancement mode sequence to enhance the training image x, so as to obtain two different first enhancement images x 1 And a second enhanced image x 2
Illustratively, the enhancement process of the training image can be represented by the following formula (1):
x 1 =aug(x)
x 2 =aug(x) (1)
wherein aug () represents an enhancement operation, x represents a training image, x 1 Representing a first enhanced image, x 2 Representing a second enhanced image.
The computing equipment enhances the training image to obtain a first enhanced image x 1 And a second enhanced image x 2 Thereafter, the first enhanced image x 1 And sending the first characteristic information into a first image enhancement model for processing to obtain the first characteristic information of the training image. Second enhanced image x 2 And sending the second characteristic information to the first image enhancement model for processing to obtain the second characteristic information of the training image. Exemplary, the first enhanced image model includes a first feature extraction module and the second enhanced image model includes a second feature extraction module, then the computing device extracts the first enhanced image x from the first feature extraction module 1 Performing feature extraction to obtain first feature information of the training image, and performing feature extraction on the first feature information by a second feature extraction moduleSecond enhanced image x 2 And extracting the characteristics to obtain second characteristic information of the training image.
Exemplary, the first enhanced image x is processed by a first image processing model and a second image processing model 1 And a second enhanced image x 2 The process of extracting the features to obtain the first feature information and the second feature information can be represented by the following formula (2):
feat 1 =f 1 (x 1 ;θ 1 )
feat 2 =f 2 (x 2 ;θ 2 ) (2)
wherein θ 1 Trainable parameters, θ, representing a first image processing model 2 Trainable parameters representing a first image processing model, feat 1 For the first characteristic information, the feature 2 Is the second characteristic information. Exemplary, feat 1 And feat 2 Is of dimension [ B, D]A vector of size, where B represents the size of the current training batch and D represents the dimension of a single feature.
As can be seen from the foregoing, in the embodiment of the present application, the computing device performs feature extraction on the training image through the first image processing model and the second image processing model, and the manner of obtaining the first feature information and the second feature information at least includes the following two ways:
the first is that different training images are enhanced, and the training images are directly input into a first image processing model and a second image processing model to perform feature extraction, so that first feature information and second feature information are obtained.
And secondly, firstly enhancing the training image to obtain a first enhanced image and a second enhanced image, then carrying out feature extraction on the first enhanced image through a first image processing model to obtain first feature information, and carrying out feature extraction on the second enhanced image through a second image processing model to obtain second feature information.
Based on the above steps, the computing device performs the following step S102 after obtaining the first feature information and the second feature information of the training image.
S102, determining C classification feature prototypes in the current training stage.
The C classification feature prototypes are obtained by clustering feature information of each training image in a training image set, and C is a positive integer.
In the embodiment of the application, in order to reduce the problem of poor model training performance caused by manually labeling training labels, the C classification feature prototypes in the current training stage are determined, the first similarity between the first feature information obtained by processing a first image processing model and the C classification feature prototypes is determined, the second similarity between the second feature information obtained by processing a second image processing model and the C classification feature prototypes is determined, the first similarity is used as a pseudo label of the second image processing model, the second similarity is used as a pseudo label of the first image processing model, and the first image processing model and the second image processing model are subjected to mutual supervision training to improve the training effect of the models.
As can be seen from the above, when the computing device trains the first image processing model and the second image processing model, multiple training rounds are required, and each training round is basically the same. The current wheel training will be described herein as an example. In the current training stage (i.e. current round training), C classification feature prototypes need to be determined, where the C classification feature prototypes are obtained by clustering feature information of each training image in the training image set, and C is a super parameter. For example, feature information of each training image in the training image set is clustered, and a feature center of each cluster is determined as a feature prototype of the cluster.
The following describes a specific procedure for determining the prototype of the C classification features in the current training stage.
The above-mentioned C is a controllable superparameter, and C may be set to be a superparameter larger than the target classification number (for example, defect type number), which allows the feature distribution of the same classification (for example, the same defect) to be uniformly dispersed in the feature space, so that the prediction freedom of the model is higher.
The embodiment of the application does not limit the specific mode of determining C classification feature prototypes in the current training stage.
In some embodiments, the computing device first performs feature extraction on each training image in the training image set to obtain feature information of each training image, then clusters the feature information of each training image, for example, to gather into C classes, determines a center of the feature information under each class of the C classes as a feature prototype of the class, and may further obtain an initial feature prototype of the C classes. In this embodiment, the initial feature prototype of the C classes may be determined as the C class feature prototypes of the current training stage. That is, in this embodiment, the C classification feature prototypes corresponding to different training phases are the same, and are all initial feature prototypes of the C classifications.
In some embodiments, the model is trained to update the model. At this time, the step of determining C classification feature prototypes in the current training stage in the above step S102 includes the following steps S102-A1 and S102-A2:
S102-A1, C classification feature prototypes in the previous training stage are obtained;
S102-A2, updating the C classification feature prototypes in the previous training stage based on the first feature information and the second feature information to obtain the C classification feature prototypes in the current training stage.
In this embodiment, the computing device updates the C classification feature prototypes during each round of training, specifically, updates the C classification feature prototypes in the previous training stage using the first feature information and the second feature information obtained in the current training stage.
In some embodiments, if the current training stage is the initial training stage, the C classification feature prototypes of the previous training stage may be the initial feature prototypes of the C classifications in the above embodiments. In the initial training stage, the computing equipment performs feature extraction on each training image in the training image set to obtain feature information of each training image, and clusters the feature information of each training image to obtain initial feature prototypes of C classifications. Then, the computing device inputs the initial training image into the first image processing model and the second image processing model through the step S101, obtains the first feature information and the second feature information of the training image, updates the initial feature prototypes of the C classes by using the first feature information and the second feature information, and uses the updated C class feature prototypes as C class feature prototypes of the first training stage. Then, the C classification feature prototypes of the first training stage are used for executing subsequent steps, so that the first round of training on the first image processing model and the second image processing model is realized. In the second training stage, updating the C classification feature prototypes in the first training stage by using the first feature information and the second feature information obtained in the second training stage to obtain the C classification feature prototypes in the second training stage, performing second-round training on the first image processing model and the second image processing model by using the C classification feature prototypes in the second training stage, and so on.
In some embodiments, if the current training stage is not the initial training stage, the computing device acquires C classification feature prototypes of a previous training stage of the current training stage, and updates the C classification feature prototypes of the previous training stage by using the first feature information and the second feature information obtained in the current training stage, to obtain C classification feature prototypes of the current training stage.
The embodiment of the application updates the C classification feature prototypes in the previous training stage based on the first feature information and the second feature information in the S102-A2, and the mode of obtaining the C classification feature prototypes in the current training stage is not limited.
In some embodiments, among the C classification feature prototypes in the previous training stage, one classification feature prototype with the most similar (or closest) first feature information is determined, and the one classification feature prototype is replaced with the first feature information as the one classification updated feature prototype. Similarly, among the C classification feature prototypes in the previous training stage, one classification feature prototype with the second feature information most similar (or closest) is determined, and the second feature information is used to replace the one classification feature prototype as the feature prototype after the classification update. And updating the C classification feature prototypes in the previous training stage to obtain the C classification feature prototypes in the current training stage.
In some embodiments, the computing device clusters the first feature information, the second feature information, and the C classification feature prototypes of the previous training stage to obtain the C classification feature prototypes of the current training stage. That is, in this embodiment, the computing device reclusters the first feature information and the second feature information obtained in the current training stage with the C classification feature prototypes in the previous training stage, gathers the C classification feature prototypes into C classes, determines each cluster center in the C classification as the feature prototype after the classification update, and further obtains the C classification feature prototypes in the current training stage.
From the above, the feature prototype is obtained without the participation of the manual label y, and is thus a self-monitoring process.
Based on the above steps, the computing device obtains the C classification feature prototypes in the current training stage, and then executes the following step S102.
S103, determining a first similarity between the first feature information and each of the C classification feature prototypes in the current training stage, and a second similarity between the second feature information and each of the C classification feature prototypes in the current training stage.
The embodiment of the application does not limit the specific mode of determining the first similarity between the first characteristic information and each of the C classification characteristic prototypes in the current training stage and the second similarity between the second characteristic information and each of the C classification characteristic prototypes in the current training stage.
In some embodiments, the computing device directly calculates a first similarity between the first feature information and each of the C classification feature prototypes of the current training stage and calculates a second similarity between the second feature information and each of the C classification feature prototypes of the current training stage by a preset similarity calculation method.
In some embodiments, if the first feature information is obtained based on the first enhanced image, the second feature information is obtained based on the second enhanced image. At this time, the step of determining the first similarity between the first feature information and each of the C classification feature prototypes and the second similarity between the second feature information and each of the C classification feature prototypes in S103 described above, includes the steps of S103-B1 to S103-B3 as follows:
S103-B1, respectively performing de-enhancement processing on the first characteristic information and the second characteristic information to obtain third characteristic information and fourth characteristic information;
S103-B2, determining the similarity between the third characteristic information and each of the C classification characteristic prototypes as a first similarity;
S103-B3, determining the similarity between the fourth characteristic information and each of the C classification characteristic prototypes as a second similarity.
In this embodiment, if the computing device performs enhancement processing on the training image when determining the first feature information and the second feature information, and obtains the first feature information based on the first enhancement image, and obtains the second feature information based on the second enhancement image, in this step, the computing device performs de-enhancement processing on the first feature information and the second feature information to remove the information of the data enhancement operation itself, so as to obtain third feature information and fourth feature information that only include semantic information of the training image itself.
The embodiment of the application carries out the de-enhancement processing on the first characteristic information and the second characteristic information respectively by the computing equipment, and the specific mode for obtaining the third characteristic information and the fourth characteristic information is not limited.
In one possible implementation manner, the first feature information and the second feature information are subjected to a de-enhancement process based on a manner opposite to the enhancement manner, so as to obtain third feature information and fourth feature information after the de-enhancement.
In one possible implementation, as shown in fig. 6, the computing device performs, through a first mapper, de-enhancing processing on the first feature information to obtain a third feature information; and performing de-enhancement processing on the second characteristic information through a second mapper to obtain a fourth characteristic information. In this implementation, the purpose of the first mapper and the second mapper is to perform nonlinear mapping on the first feature information and the second feature information, so as to achieve the effect of de-adding, so that the third feature information and the fourth feature information include semantic information of the training image itself.
The embodiment of the application does not limit the specific network structure of the first mapper and the second mapper.
In one example, at least one of the first and second mappers described above has been trained to end without participating in the training process of the first and second image processing models.
In one example, the first and second mappers described above are not trained, and end-to-end training is performed with the first and second image processing models throughout the training process.
Illustratively, the computing device may determine the third characteristic information and the fourth characteristic information by the following equation (3):
feat′ 1 =g(feat 1 ;θ g1 )
feat′ 2 =g(feat 2 ;θ g1 ) (3)
where g () represents the mapper, θ g1 For the parameters of the first mapper, θ g1 Is the parameter of the second mapper, feat' 1 For the third characteristic information, feat' 2 Is the fourth characteristic information.
The computing equipment respectively carries out de-enhancement processing on the first characteristic information and the second characteristic information to obtain third characteristic information and fourth characteristic information, and then determines the similarity between the third characteristic information and each classification characteristic prototype in the C classification characteristic prototypes as a first similarity. And similarly, determining the similarity between the fourth characteristic information and each of the C classification characteristic prototypes as a second similarity.
Illustratively, the computing device may determine the first similarity and the second similarity by the following equation (4):
q 1 =sim(feat′ 1 ,proto)
q 2 =sim(feat′ 2 ,proto) (4)
where sim () represents the similarity calculation, q 1 For the first similarity, q 2 For the second similarity, proto represents the C class feature prototypes of the current training phase.
The embodiment of the application does not limit a specific similarity calculation mode, and can be realized by adopting cosine similarity and other modes.
In some embodiments, the first similarity and the second similarity are vectors with a length of C.
In some embodiments, the sum of the first similarities is 1, which represents the probability of matching the first feature information with the C classification feature prototypes. The sum of the second similarity is 1, which indicates the matching probability of the second feature information and the C kinds of classification feature prototypes.
The computing device determines C classification feature prototypes in the current training stage based on the above steps, and performs the following step S103 after determining a first similarity between the first feature information and each of the C classification feature prototypes, and a second similarity between the second feature information and each of the C classification feature prototypes.
S104, training the first image processing model and the second image processing model by taking the first similarity as a pseudo tag of the second image processing model and taking the second similarity as a pseudo tag of the first image processing model.
From the above, the first similarity represents the matching probability or prediction probability of the first feature information and the C kinds of classification feature prototypes, and the second similarity represents the matching probability or prediction probability of the second feature information and the C kinds of classification feature prototypes. Therefore, the computing equipment trains the first image processing model and the second image processing model by taking the first similarity and the second similarity as pseudo labels of the first image processing model and the second image processing model so as to reduce the influence of noise of manual labeling on model training, further improve the training effect of the model and improve the robustness of the model.
In the embodiment of the application, in the actual training process, the first similarity and the second similarity are interacted, namely, the first similarity is used as a pseudo tag of the second image processing model and is used for designating the training of the second image processing model. And taking the second similarity as a pseudo tag of the first image processing model, and designating the training of the first image processing model to realize the mutual supervision training of the models so as to further improve the training effect of the models.
The embodiment of the application does not limit the specific mode of training the first image processing model and the second image processing model by taking the first similarity as the pseudo tag of the second image processing model and taking the second similarity as the pseudo tag of the first image processing model.
In some embodiments, the computing device trains the second image processing model alone using the first similarity as a pseudo tag for the second image processing model, e.g., determines a loss based on the first similarity, and adjusts parameters in the second image processing model based on the loss. Similarly, the computing device trains the first image processing model alone using the second similarity as a pseudo tag for the first image processing model, e.g., determines a loss based on the second similarity, and adjusts parameters in the first image processing model based on the loss.
In some embodiments, the computing device co-trains the first image processing model and the second image processing model based on the first similarity and the second similarity. For example, a loss is determined based on the first similarity and the second similarity, and parameters in the first image processing model and the second image processing model are adjusted based on the loss, respectively. Based on this, the above S104 includes the steps of S104-A and S104-B as follows:
S104-A, taking the first similarity as a pseudo tag of a second image processing model, taking the second similarity as a pseudo tag of the first image processing model, and determining unsupervised losses of the first image processing model and the second image processing model;
S104-B, training the first image processing model and the second image processing model based on the unsupervised loss.
The specific process of determining the unsupervised loss of the first image processing model and the second image processing model using the first similarity as the pseudo tag of the second image processing model and the second similarity as the pseudo tag of the first image processing model in S104-a will be described below.
The embodiment of the application does not limit the specific mode of determining the unsupervised loss by the computing equipment.
In some embodiments, the computing device inputs the training image into the first image processing model, and obtains a prediction result output by the first image processing model, that is, outputs a prediction probability that the training image belongs to each of the C classes, and marks the prediction probability as a prediction probability 1. And similarly, inputting the training image into the second image processing model to obtain a prediction result output by the second image processing model, namely outputting the prediction probability of each class of C classes of the training image, and marking the prediction probability as the prediction probability 2. Next, the computing device determines an unsupervised loss based on the first similarity, the second similarity, the predictive probability 1, and the predictive probability 2. For example, a difference 1 between the first similarity and the prediction probability 2 and a difference 2 between the second similarity and the prediction probability 1 are determined, and based on the difference 1 and the difference 2, an unsupervised loss is obtained.
In some embodiments, the step S104-A includes the steps of S104-A1 to S104-A4 as follows:
S104-A1, obtaining a first prediction probability value of the training image belonging to C categories based on the first characteristic information, and obtaining a second prediction probability value of the training image belonging to C categories based on the second characteristic information;
S104-A2, taking the second similarity as a pseudo tag of the first image processing model, and determining a first loss between the first prediction probability value and the second similarity;
S104-A3, using the first similarity as a pseudo tag of a second image processing model, and determining a second loss between a second prediction probability value and the first similarity;
S104-A4, determining an unsupervised loss based on the first loss and the second loss.
In this embodiment, the computing device obtains, based on the above steps, first feature information and second feature information, determines, based on the first feature information, a first predicted probability value that the training image belongs to the C categories, and obtains, based on the second feature information, a second predicted probability value that the training image belongs to the C categories, in addition to determining the first similarity and the second similarity using the first feature information and the second feature information.
In some embodiments, as shown in fig. 7, the first image processing model includes a first feature extraction module and a third prediction module, and the second image processing model includes a second feature extraction module and a fourth prediction module, where the second feature extraction module and the first feature extraction module are used for feature extraction, and the third prediction module and the fourth prediction module are used for probability prediction. Based on the first feature extraction module, the computing device can extract first feature information of the training image, input the first feature information into the third prediction module to perform probability prediction, and obtain a first prediction probability value of the training image belonging to C categories. Similarly, the computing device may extract second feature information of the training image through the second feature extraction module, and input the second feature information into the fourth prediction module to perform probability prediction, so as to obtain a second prediction probability value of the training image belonging to the class C.
The embodiment of the application does not limit the specific network structure of the third prediction module and the fourth prediction module. Illustratively, the third prediction module and the fourth prediction module each include a fully connected layer FC for probabilistic prediction.
In some embodiments, if the first feature information and the second feature information are obtained based on the enhanced image of the training image, the step S104-A1 includes the following steps S104-a11 and S104-a 12:
S104-A11, respectively performing de-enhancement processing on the first characteristic information and the second characteristic information to obtain third characteristic information and fourth characteristic information;
S104-A12, obtaining a first prediction probability value based on the third characteristic information, and obtaining a second prediction probability value based on the fourth characteristic information.
In this embodiment, if the first feature information and the second feature information are obtained based on the enhanced image of the training image, the computing device first performs the de-enhancement processing on the first feature information and the second feature information, respectively, before determining the first prediction probability value and the second prediction probability value by using the first feature information and the second feature information, to obtain the third feature information and the fourth feature information. At this time, the third feature information and the fourth feature information only include features of the training image, and further, based on the third feature information, a first predicted probability value is obtained, and based on the fourth feature information, a second predicted probability value is obtained.
The specific process of performing the de-enhancement processing on the first feature information and the second feature information to obtain the third feature information and the fourth feature information in S104-a11 is described with reference to the related description of S103-B1, which is not repeated herein.
The embodiment of the application does not limit the specific mode of obtaining the first predicted probability value based on the third characteristic information and obtaining the second predicted probability value based on the fourth characteristic information in the step S104-A12. For example, the third feature information and the fourth feature information are subjected to data analysis to obtain a first predicted probability value and a second predicted probability value.
In some embodiments, as shown in fig. 8, the third feature information is processed by a first prediction module to obtain a first predicted probability value, and the fourth feature information is processed by a second prediction module to obtain a second predicted probability value.
The embodiment of the application does not limit the specific network structure of the first prediction module and the second prediction module. Illustratively, the first prediction module and the second prediction module each include a fully connected layer FC for probabilistic prediction.
In one example, at least one of the first prediction module and the second prediction module has been trained to end without participating in the training process of the first image processing model and the second image processing model.
In one example, the first and second prediction modules are untrained and the end-to-end training is performed with the first and second image processing models throughout the training process.
The computing device performs the steps of S104-A2 and S104-A3 described above after determining the first predicted probability value and the second predicted probability value based on the steps described above.
The embodiment of the application does not limit the specific way that the computing device takes the second similarity as the pseudo tag of the first image processing model to determine the first loss between the first prediction probability value and the second similarity.
From the above, the second similarity represents the similarity between the second feature information and each of the C classification feature prototypes, and is a vector with length C, denoted as [ q ] 21 、q 22 …q 2c ]. The first predicted probability value represents the probability that the training image belongs to each of the C classes, and is also a vector with the length of C, and is denoted as [ p ] 11 、p 12 …p 1c ]。
Based on this, in some embodiments, the computing device may determine a difference between the first predicted probability value and the second similarity as the first loss.
In some embodiments, the computing device determines the first loss between the first predicted probability value and the second similarity based on the first loss function using the second similarity as a pseudo tag for the first image processing model.
The embodiment of the application does not limit the specific type of the first loss function.
Illustratively, the first loss function is a cross entropy loss function. In one possible implementation of this example, the computing device derives the first penalty by equation (5) as follows:
L1=CE(softmax(FC(feat′ 1 )),q 2 ) (5)
wherein L1 represents the first loss, softmax (FC (feat)' 1 ) For a first predicted probability value, i.e. for a third characteristic informationAnd processing by using an activation function softmax after the full connection layer processing to obtain a first prediction probability value. q 2 For the second similarity, CE () represents the cross entropy loss function.
The embodiment of the application does not limit the specific way of determining the second loss between the second prediction probability value and the first similarity by taking the first similarity as the pseudo tag of the second image processing model by the computing equipment.
From the above, the first similarity represents the similarity between the first feature information and each of the C classification feature prototypes, and is a vector with length C, denoted as [ q ] 11 、q 12 …q 1c ]. The second predicted probability value represents the probability that the training image belongs to each of the C classes, and is also a vector with the length of C, and is denoted as [ p ] 21 、p 22 …p 2c ]。
Based on this, in some embodiments, the computing device may determine a difference between the second predicted probability value and the first similarity as the second loss.
In some embodiments, the computing device determines a second loss between the second predicted probability value and the first similarity based on a second loss function using the first similarity as a pseudo tag for the second image processing model.
The embodiment of the application does not limit the specific type of the second loss function.
Illustratively, the second loss function is a cross entropy loss function. In one possible implementation of this example, the computing device derives the second penalty by equation (6) as follows:
L2=CE(softmax(FC(feat′ 2 )),q 1 ) (6)
wherein L2 represents the second loss, softmax (FC (coat)' 2 ) And (3) obtaining a second predicted probability value by processing the fourth characteristic information through the full connection layer and then using an activation function softmax. q 1 For the first similarity, CE () represents the cross entropy loss function.
After determining the first loss and the second loss based on the method, the computing device performs the step of S103-D described above, and determines an unsupervised loss based on the first loss and the second loss.
For example, the sum of the first loss and the second loss is determined as an unsupervised loss.
For another example, an average of the first loss and the second loss is determined as an unsupervised loss. Illustratively, the computing device determines the unsupervised loss based on equation (7) as follows:
Wherein L is unsup For unsupervised loss, CE (softmax (FC (feat)' 2 )),q 1 ) For the second loss, CE (softmax (FC (coat) 1 ′)),q 2 ) Is the first loss.
After determining the unsupervised loss, the computing device performs the step S104-B described above, and trains the first image processing model and the second image processing model based on the unsupervised loss.
According to the method, the human label y is not needed to participate in the determining process of the unsupervised loss, so that the unsupervised loss is one, and the first image processing model and the second image processing model are trained based on the unsupervised loss, so that the influence of noise of the human label on model training can be reduced, the training effect of the model is improved, and the robustness of the model is improved.
In the embodiment of the application, based on the unsupervised loss, training the first image processing model and the second image processing model at least comprises the following modes:
mode 1, the computing device trains the first image processing model and the second image processing model based solely on the unsupervised loss.
Mode 2, the computing device trains the first image processing model and the second image processing model based on the unsupervised loss and the supervised loss. At this time, the computing device determines that there is a supervision loss based on the first characteristic information, the second characteristic information, and a preset training label (i.e., a manual label) before executing S104-B described above. The first image processing model and the second image processing model are then trained based on the unsupervised loss and the supervised loss.
That is, in this mode 2, the computing device determines the unsupervised loss through the above steps, and at the same time determines the supervised loss based on the first feature information, the second feature information, and the training tag, and further trains the first image processing model and the second image processing model based on the unsupervised loss and the supervised loss.
The following describes a specific process for determining a supervised loss by a computing device based on first feature information, second feature information, and a preset training label.
The embodiment of the application does not limit the specific mode of determining the supervised loss based on the first characteristic information, the second characteristic information and the training label.
In some embodiments, the first feature information, the second feature information and the training tag are substituted into a preset loss function, and the supervised loss is calculated.
In some embodiments, the first feature information and the second feature information are processed to obtain a predictive probability, and the predictive probability is compared with the training label to obtain the supervised loss.
In some embodiments, as shown in fig. 9, the first image processing model includes a third prediction module, the second image processing model includes a fourth prediction module, and the computing device processes the first feature information through the third prediction module to obtain a third prediction probability value that the training image belongs to the class C; processing the second characteristic information through a fourth prediction module to obtain a fourth prediction probability value of the training image belonging to C categories; and obtaining the supervised loss based on the third predicted probability value, the second predicted probability value and the training label.
In this embodiment, as shown in fig. 9, the computing device may extract first feature information of the training image through the first feature extraction module, and input the first feature information into the third prediction module to perform probability prediction, so as to obtain a third prediction probability value of the training image belonging to the class C. Similarly, the computing device may extract second feature information of the training image through the second feature extraction module, and input the second feature information into the fourth prediction module to perform probability prediction, so as to obtain a fourth prediction probability value of the training image belonging to the class C.
And then, obtaining the supervised loss based on the third predicted probability value, the fourth predicted probability value and the training label.
In one example, the computing device inputs the third predicted probability value, the fourth predicted probability value, and the training tag into a predetermined loss function resulting in a supervised loss.
In one example, as shown in fig. 9, the computing device determines a third loss between a third predicted probability value and the training tag; determining a fourth loss between the fourth predicted probability value and the training label; based on the third loss and the fourth loss, a supervised loss is determined.
The embodiment of the application does not limit the specific way of determining the third loss between the third predicted probability value and the training label.
From the above, the third predicted probability value and the training label represent the probability that the training image belongs to each of the C classes, and are each a vector with a length of C.
Based on this, in some embodiments, the computing device may determine a difference between the third predicted probability value and the training label as the third loss.
In some embodiments, the computing device determines a third loss between the third predicted probability value and the probability tag based on a third loss function.
The embodiment of the application does not limit the specific type of the third loss function.
Illustratively, the third loss function is a cross entropy loss function. In one possible implementation of this example, the computing device derives the third penalty by equation (8) as follows:
L3=CE(pred 1 ,y) (8)
wherein L3 represents a third loss,pred 1 For the third predicted probability value, y is the training label and CE () represents the cross entropy loss function.
The embodiment of the application does not limit the specific way of determining the fourth predicted probability value and the fourth loss between the training tags.
From the above, the fourth predicted probability value and the training label represent the probability that the training image belongs to each of the C classes, and are each a vector with a length of C.
Based on this, in some embodiments, the computing device may determine a difference between the fourth predicted probability value and the training label as the fourth loss.
In some embodiments, the computing device determines a fourth loss between the fourth predicted probability value and the probability tag based on a fourth loss function.
The embodiment of the application does not limit the specific type of the fourth loss function.
Illustratively, the fourth loss function is a cross entropy loss function. In one possible implementation of this example, the computing device derives the fourth loss by equation (9) as follows:
L4=CE(pred 2 ,y) (9)
wherein L4 represents a fourth loss, pred 2 And a fourth predicted probability value.
The computing device determines a third loss and a fourth loss based on the steps, and then determines a supervised loss based on the third loss and the fourth loss.
For example, the sum of the third loss and the fourth loss is determined as the supervised loss.
For another example, the average of the third loss and the fourth loss is determined as the supervised loss.
Illustratively, the supervised loss is determined by the following equation (10):
wherein L is sup Representing supervised losses, CE (pred 1 Y) represents the third loss, CE (pred 2 Y) expresses a fourth loss.
The computing device determines an unsupervised loss and a supervised loss based on the steps, and then trains the first image processing model and the second image processing model based on the unsupervised loss and the supervised loss.
Illustratively, based on the unsupervised loss and the supervised loss, a total loss is determined, based on which the first image processing model and the second image processing model are trained end-to-end.
For example, the unsupervised and supervised losses are added to obtain the total loss.
For another example, a weighted sum of the unsupervised and supervised losses is determined as the total loss. In one possible implementation of this example, the total loss is determined by the following equation (11):
L=L sup +β*L unsup (11)
where L is the total loss and β is a superparameter used to control the extent of contribution of the two losses.
In the embodiment of the application, the model is trained by the unsupervised loss auxiliary supervised loss, the problem of label noise can be overcome to a certain extent by introducing the unsupervised loss, and the problem that the model is over-fitted to noise data and the robustness of the model is reduced due to the fact that the model is trained only by the supervised loss is avoided. That is, when the embodiment of the application trains the model through the unsupervised loss and the supervised loss, the model can overcome the influence of noise data to a certain extent, the robustness of the model is enhanced, the model prediction result of the actual model deployment in the online stage is improved, and a reliable technical support is provided for the fields of industrial AI defect quality detection and the like.
According to the image processing model training method provided by the embodiment of the application, the first image processing model and the second image processing model are used for respectively extracting the characteristics of the training image to obtain the first characteristic information and the second characteristic information of the training image. And then, C classification feature prototypes in the current training stage are determined, wherein the C classification feature prototypes are obtained by clustering feature information of each training image in the training image set. And further determining a first degree of similarity between the first feature information and each of the C classification feature prototypes, and determining a second degree of similarity between the second feature information and each of the C classification feature prototypes. The first similarity is used as a pseudo tag of the second image processing model, the second similarity is used as a pseudo tag of the first image processing model, and the first image processing model and the second image processing model are trained. That is, in the embodiment of the present application, the first similarity determined based on the first image processing model is used as the pseudo tag of the second image processing model, and the second similarity determined based on the second image processing model is used as the pseudo tag of the first image processing model, so that training is supervised by interaction information to overcome the interference of noise tags, improve the training effect of the first image processing model and the second image processing model, and enhance the robustness of the models. In the subsequent actual prediction stage, any one image processing model of the first image processing model and the second image processing model can be selected for prediction, and the image processing effect is improved.
The whole model training process of the embodiment of the present application is described above, and a specific example is described below to further describe the model training method provided by the embodiment of the present application.
Fig. 10 is a flowchart of an image processing model training method according to an embodiment of the present application, as shown in fig. 10, including:
s201, enhancement processing is carried out on the training image, and a first enhancement image and a second enhancement image are obtained.
S202, performing feature extraction on the first enhanced image through the first image processing model to obtain first feature information, and performing feature extraction on the second enhanced image through the second image processing model to obtain second feature information.
As shown in fig. 9, the first image processing model includes a first feature extraction module, and the computing device inputs the first enhanced image into the first feature extraction module to perform feature extraction, and outputs first feature information. The second image processing model comprises a second feature extraction module, and the computing device inputs the second enhanced image into the second feature extraction module to perform feature extraction and outputs second feature information.
S203, processing the first characteristic information through a third prediction module to obtain a third prediction probability value of the training image belonging to the C categories, and processing the second characteristic information through a fourth prediction module to obtain a fourth prediction probability value of the training image belonging to the C categories.
As shown in fig. 9, the first image processing model further includes a third prediction module, and the computing device inputs the first feature information into the third prediction module for processing, so as to obtain a third prediction probability value of the training image belonging to the class C. The second image processing model further comprises a fourth prediction module, and the computing equipment inputs the second characteristic information into the fourth prediction module for processing to obtain a fourth prediction probability value of the training image belonging to C categories.
S204, obtaining the supervised loss based on the third predicted probability value, the second predicted probability value and the training label.
The specific implementation process of S204 above is consistent with the process of determining the supervised loss in S104 above, and reference may be made to the description of the relevant process in S104 above.
Illustratively, determining a third loss between the third predicted probability value and the training label; determining a fourth loss between the fourth predicted probability value and the training label; based on the third loss and the fourth loss, a supervised loss is determined.
For example, a third penalty between the third predicted probability value and the probability tag is determined based on the third penalty function. A fourth loss between the fourth predicted probability value and the training label is determined based on the fourth loss function.
For example, the average of the third loss and the fourth loss is determined as the supervised loss.
S205, performing de-enhancement processing on the first characteristic information through a first mapper to obtain third characteristic information, and performing de-enhancement processing on the second characteristic information through a second mapper to obtain fourth characteristic information.
The specific execution sequence of S205 and S203 is not limited in the embodiments of the present application, for example, S205 may be executed before S203, after S203, or synchronously with S203, which is not limited in the embodiments of the present application.
The specific implementation process of S205 may refer to the description related to S103, which is not described herein.
S206, determining C classification feature prototypes in the current training stage.
The specific execution sequence of S206 and S205 is not limited, for example, S206 may be executed before S205, or executed after S205, or executed synchronously with S205, which is not limited by the present application.
The specific implementation process of S206 may refer to the description related to S102, which is not described herein.
S207, determining the similarity between the third characteristic information and each of the C classification characteristic prototypes as a first similarity, and determining the similarity between the fourth characteristic information and each of the C classification characteristic prototypes as a second similarity.
The specific implementation process of S207 may refer to the description related to S103, which is not described herein.
S208, processing the third characteristic information through the first prediction module to obtain a first prediction probability value, and processing the fourth characteristic information through the second prediction module to obtain a second prediction probability value.
The specific execution sequence of S208 and S207 is not limited, for example, S208 may be executed before S207, or after S207, or may be executed synchronously with S207, which is not limited by the present application.
The specific implementation process of S208 may refer to the description of S104-A1, which is not repeated here.
S209, taking the second similarity as a pseudo tag of the first image processing model, and determining a first loss between the first prediction probability value and the second similarity; and determining a second loss between the second prediction probability value and the first similarity by taking the first similarity as a pseudo tag of the second image processing model.
The specific implementation process of S209 may refer to the descriptions related to S104-A2 and S104-A3, which are not described herein.
S210, determining an unsupervised loss based on the first loss and the second loss.
For example, an average of the first loss and the second loss is determined as an unsupervised loss.
S211, determining total loss based on the unsupervised loss and the supervised loss.
For example, a weighted sum of the unsupervised and supervised losses is determined as the total loss.
S212, performing end-to-end training on the first image processing model, the second image processing model, the first mapper and the second mapper based on the total loss.
In some embodiments, the first image processing model, the second image processing model, the first mapper, the second mapper, the first prediction module, and the second prediction module are trained end-to-end based on the total loss.
In the embodiment of the application, the first similarity determined based on the first image processing model is used as the pseudo tag of the second image processing model, and the second similarity determined based on the second image processing model is used as the pseudo tag of the first image processing model. Therefore, the training is supervised through the interactive information so as to overcome the interference of noise labels, improve the training effect of the first image processing model and the second image processing model, enhance the robustness of the models, further improve the model prediction result of the actual model deployment in the online stage, and further provide reliable technical support for the fields of industrial AI defect quality detection and the like.
The embodiment of the model training method of the present application is described in detail above with reference to fig. 3 to 10, and the image processing method provided by the embodiment of the present application is described below.
Fig. 11 is a flowchart of an image processing method according to an embodiment of the application. The execution subject of the embodiment of the present application is a device having an image processing function, for example, an image processing device. In some embodiments, the image processing apparatus may be the computing device in fig. 2, or the terminal device in fig. 2, or a system of the computing device and the terminal device in fig. 2. For convenience of description, the embodiment of the present application will be described by taking the execution subject as an example of a computing device, where the computing device and the computing device performing model training may be the same device or different devices, and the embodiment of the present application is not limited thereto.
As shown in fig. 11, the method of the embodiment of the present application includes the following steps:
s301, acquiring a target image to be processed.
The embodiment of the application does not limit the specific type of the target data.
For example, if the first image processing model and the second image processing model are industrial quality inspection models, and are used for detecting defect types of industrial products, the target image is a defect image of the industrial product to be detected.
S302, processing the target image through the target model to obtain a processing result of the target image.
The target model is any one of the first image processing model and the second image processing model. That is, in the embodiment of the present application, two models, i.e., a first image processing model and a second image processing model, are trained simultaneously by the model training manner described above. In practical use, one image processing model is selected from the two trained models to serve as a target model, and the target image is processed.
The first image processing model and the second image processing model are obtained through non-supervision loss training, the non-supervision loss is obtained by taking a first similarity as a pseudo tag of the second image processing model, a second similarity is determined by taking the second similarity as the pseudo tag of the first image processing model, the first similarity is the similarity between the first feature information and C classification feature prototypes in the current training stage, the second similarity is the similarity between the second feature information and the C classification feature prototypes, the first feature information and the second feature information are obtained by respectively carrying out feature extraction on training images based on the first image processing model and the second image processing model, and the C classification feature prototypes are obtained by clustering feature information of each training image in the training image set.
The specific training process of the first image processing model and the second image processing model refers to the specific description of the foregoing model training embodiment, and is not described herein.
In one example, if the target image is a defect image of an industrial product and the target model is an industrial quality inspection model for detecting a defect type of the industrial product, the computing device identifies the defect type of the target image by the target model to obtain the defect type of the target image.
Illustratively, as in FIG. 12, the computing device inputs the target image to be processed into a target model that outputs probabilities that defects in the target image belong to each of the N defect categories. And finally, determining the defect type of the target image according to the probability.
For example, if there is a probability greater than a preset value (e.g., 0.5) in the defect class probability corresponding to the target image. If the probability of the defect type probability corresponding to the target image is larger than the preset value, as shown in 13A, the defect type probability corresponding to the target image is 0.95, the target image is indicated to be the defect image, and then the defect classification of the target image is determined based on the specific size of the defect type probability.
If the probability of the defect type probability corresponding to the target image is not greater than the preset value (for example, 0.5), for example, as shown in fig. 13A, the defect type probability corresponding to the target image is 0.05, and the target image is determined to be a non-defective image.
In the embodiment of the application, during model training, the first similarity determined based on the first image processing model is used as a pseudo tag of the second image processing model, and the second similarity determined based on the second image processing model is used as the pseudo tag of the first image processing model. Therefore, the training is supervised through the interactive information so as to overcome the interference of noise labels, improve the training effect of the first image processing model and the second image processing model and strengthen the robustness of the models. Thus, when the first image processing model or the second image processing model is used for processing the target image, the image processing effect can be improved.
The model training and image processing method embodiments of the present application are described above in detail with reference to fig. 3 to 13B, and the apparatus embodiments of the present application are described below in detail with reference to fig. 14 to 15.
Fig. 14 is a schematic block diagram of an image processing model training apparatus according to an embodiment of the present application. The apparatus 10 may be applied to a computing device.
As shown in fig. 14, the image processing model training apparatus 10 includes:
a feature extraction unit 11, configured to perform feature extraction on a training image through a first image processing model and a second image processing model, so as to obtain first feature information and second feature information of the training image;
a prototype determining unit 12, configured to determine C classification feature prototypes in the current training stage, where the C classification feature prototypes are obtained by clustering feature information of each training image in the training image set, and C is a positive integer;
a similarity determining unit 13 for determining a first similarity between the first feature information and each of the C classification feature prototypes, and a second similarity between the second feature information and each of the C classification feature prototypes;
and a training unit 14, configured to train the first image processing model and the second image processing model with the first similarity as a pseudo tag of the second image processing model and the second similarity as a pseudo tag of the first image processing model.
In some embodiments, the feature extraction unit 11 is specifically configured to perform enhancement processing on the training image to obtain a first enhanced image and a second enhanced image; extracting features of the first enhanced image through the first image processing model to obtain first feature information; and extracting the characteristics of the second enhanced image through the second image processing model to obtain the second characteristic information.
In some embodiments, the prototype determining unit 12 is specifically configured to perform de-enhancement processing on the first feature information and the second feature information, to obtain third feature information and fourth feature information; determining the similarity between the third characteristic information and each of the C classification characteristic prototypes as the first similarity; and determining the similarity between the fourth characteristic information and each of the C classification characteristic prototypes as the second similarity.
In some embodiments, the prototype determining unit 12 is specifically configured to obtain C classification feature prototypes in the previous training stage; and updating the C classification feature prototypes in the previous training stage based on the first feature information and the second feature information to obtain the C classification feature prototypes in the current training stage.
In some embodiments, the prototype determining unit 12 is specifically configured to cluster the first feature information, the second feature information, and the C classification feature prototypes in the previous training stage, to obtain the C classification feature prototypes in the current training stage.
In some embodiments, the training unit 14 is specifically configured to determine the unsupervised loss of the first image processing model and the second image processing model by using the first similarity as a pseudo tag of the second image processing model and the second similarity as a pseudo tag of the first image processing model; training the first image processing model and the second image processing model based on the unsupervised loss.
In some embodiments, the training unit 14 is specifically configured to obtain, based on the first feature information, a first predicted probability value of the training image belonging to the class C, and obtain, based on the second feature information, a second predicted probability value of the training image belonging to the class C; determining a first loss between the first predicted probability value and the second similarity using the second similarity as a pseudo tag for the first image processing model; determining a second loss between the second predicted probability value and the first similarity by taking the first similarity as a pseudo tag of the second image processing model; the unsupervised loss is determined based on the first loss and the second loss.
In some embodiments, if the first feature information is obtained based on the first enhanced image enhanced by the training image, and the second feature information is obtained based on the second enhanced image enhanced by the training image, the training unit 14 is specifically configured to perform de-enhancement processing on the first feature information and the second feature information, so as to obtain third feature information and fourth feature information; and obtaining the first predicted probability value based on the third characteristic information, and obtaining the second predicted probability value based on the fourth characteristic information.
In some embodiments, the training unit 14 is specifically configured to process the third feature information by using a first prediction module to obtain the first predicted probability value, and process the fourth feature information by using the second prediction module to obtain the second predicted probability value.
In some embodiments, the training unit 14 is specifically configured to determine, using the second similarity as a pseudo tag of the first image processing model, a first loss between the first prediction probability value and the second similarity based on a first loss function.
In some embodiments, the training unit 14 is specifically configured to determine, using the first similarity as a pseudo tag of the second image processing model, a second loss between the second prediction probability value and the first similarity based on a second loss function.
In some embodiments, the training unit 14 is specifically configured to perform a de-enhancement process on the first feature information by using a first mapper, so as to obtain the third feature information; and performing de-enhancement processing on the second characteristic information through a second mapper to obtain the fourth characteristic information.
In some embodiments, the training unit 14 is further configured to determine, before training the first image processing model and the second image processing model based on the unsupervised loss, a supervised loss based on the first feature information, the second feature information, and a preset training label; training the first image processing model and the second image processing model based on the unsupervised loss and the supervised loss.
In some embodiments, the first image processing model includes a third prediction module, the second image processing model includes a fourth prediction module, and the training unit 14 is specifically configured to process the first feature information through the third prediction module to obtain a third prediction probability value that the training image belongs to class C; processing the second characteristic information through the fourth prediction module to obtain a fourth prediction probability value of the training image belonging to C categories; and obtaining the supervised loss based on the third predicted probability value, the second predicted probability value and the training label.
In some embodiments, the training unit 14 is specifically configured to determine a third loss between the third predicted probability value and the training label; determining a fourth loss between the fourth predicted probability value and the training label; the supervised loss is determined based on the third loss and the fourth loss.
In some embodiments, the training unit 14 is specifically configured to determine a third loss between the third predicted probability value and the probability label based on a third loss function.
In some embodiments, the training unit 14 is specifically configured to determine a fourth loss between the fourth predicted probability value and the training label based on a fourth loss function.
In some embodiments, training unit 14 is specifically configured to determine a total loss based on the unsupervised loss and the supervised loss; training the first image processing model and the second image processing model based on the total loss.
In some embodiments, the first image processing model and the second image processing model are industrial quality inspection models for detecting defect categories of industrial products, and the training image is a defect image of the industrial products.
It should be understood that apparatus embodiments and method embodiments may correspond with each other and that similar descriptions may refer to the method embodiments. To avoid repetition, no further description is provided here. Specifically, the apparatus shown in fig. 14 may perform the foregoing embodiment of the model training method, and the foregoing and other operations and/or functions of each module in the apparatus are respectively for implementing the corresponding method embodiment of the encoder, which is not described herein for brevity.
Fig. 15 is a schematic block diagram of an image processing apparatus provided in an embodiment of the present application. The apparatus 20 may be applied to a computing device.
As shown in fig. 15, the image processing apparatus 20 includes:
an acquisition unit 21 for acquiring a target image to be processed;
A processing unit 22, configured to process the target image through a target model, so as to obtain a processing result of the target image;
the target model is a first image processing model or a second image processing model, the first image processing model and the second image processing model are obtained by taking a first similarity as a pseudo tag of the second image processing model and a second similarity as a pseudo tag of the first image processing model through training, the first similarity is the similarity between the first feature information and C classification feature prototypes in the current training stage, the second similarity is the similarity between the second feature information and the C classification feature prototypes, the first feature information and the second feature information are obtained by respectively extracting features of training images based on the first image processing model and the second image processing model, the C classification feature prototypes are obtained by clustering feature information of each training image in a training image set, and the C is a positive integer.
In some embodiments, the target image is a defect image of an industrial product, the target model is an industrial quality inspection model, and the processing unit 22 is specifically configured to identify the defect type of the target image by using the target model, so as to obtain the defect type of the target image.
It should be understood that apparatus embodiments and method embodiments may correspond with each other and that similar descriptions may refer to the method embodiments. To avoid repetition, no further description is provided here. Specifically, the apparatus shown in fig. 15 may perform the above-described embodiments of the image processing method, and the foregoing and other operations and/or functions of each module in the apparatus are respectively for implementing the corresponding method embodiments of the encoder, which are not described herein for brevity.
The apparatus of the embodiments of the present application is described above in terms of functional modules with reference to the accompanying drawings. It should be understood that the functional module may be implemented in hardware, or may be implemented by instructions in software, or may be implemented by a combination of hardware and software modules. Specifically, each step of the method embodiment in the embodiment of the present application may be implemented by an integrated logic circuit of hardware in a processor and/or an instruction in a software form, and the steps of the method disclosed in connection with the embodiment of the present application may be directly implemented as a hardware decoding processor or implemented by a combination of hardware and software modules in the decoding processor. Alternatively, the software modules may be located in a well-established storage medium in the art such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, and the like. The storage medium is located in a memory, and the processor reads information in the memory, and in combination with hardware, performs the steps in the above method embodiments.
FIG. 16 is a schematic block diagram of a computing device that may be used by the computing device of FIG. 16 to perform the model training method or image processing method described above, provided by an embodiment of the present application.
As shown in fig. 16, the computing device 30 may include:
a memory 31 and a processor 32, the memory 31 being arranged to store a computer program 33 and to transmit the program code 33 to the processor 32. In other words, the processor 32 may call and run the computer program 33 from the memory 31 to implement the method in an embodiment of the application.
For example, the processor 32 may be configured to perform the steps of the method 200 described above in accordance with instructions in the computer program 33.
In some embodiments of the present application, the processor 32 may include, but is not limited to:
a general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like.
In some embodiments of the present application, the memory 31 includes, but is not limited to:
volatile memory and/or nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random Access Memory, RAM) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (Double Data Rate SDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), and Direct memory bus RAM (DR RAM).
In some embodiments of the present application, the computer program 33 may be divided into one or more modules that are stored in the memory 31 and executed by the processor 32 to perform the method of recording pages provided by the present application. The one or more modules may be a series of computer program instruction segments capable of performing the specified functions, which are used to describe the execution of the computer program 33 in the computing device 900.
As shown in fig. 16, the computing device 30 may further include:
a transceiver 34, the transceiver 34 being connectable to the processor 32 or the memory 31.
The processor 32 may control the transceiver 34 to communicate with other devices, and in particular, may send information or data to other devices or receive information or data sent by other devices. The transceiver 34 may include a transmitter and a receiver. The transceiver 34 may further include antennas, the number of which may be one or more.
It should be appreciated that the various components in the computing device 30 are connected by a bus system that includes a power bus, a control bus, and a status signal bus in addition to a data bus.
According to an aspect of the present application, there is provided a computer storage medium having stored thereon a computer program which, when executed by a computer, enables the computer to perform the method of the above-described method embodiments. Alternatively, embodiments of the present application also provide a computer program product comprising instructions which, when executed by a computer, cause the computer to perform the method of the method embodiments described above.
According to another aspect of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium and executes the computer instructions to cause the computer device to perform the method of the above-described method embodiments.
In other words, when implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a digital video disc (digital video disc, DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.
Those of ordinary skill in the art will appreciate that the various illustrative modules and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.
The modules illustrated as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. For example, functional modules in various embodiments of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (20)

1. An image processing model training method, comprising:
respectively extracting features of a training image through a first image processing model and a second image processing model to obtain first feature information and second feature information of the training image;
C classification feature prototypes in the current training stage are determined, wherein the C classification feature prototypes are obtained by clustering feature information of each training image in a training image set, and C is a positive integer;
determining a first similarity between the first feature information and each of the C classification feature prototypes and a second similarity between the second feature information and each of the C classification feature prototypes;
and training the first image processing model and the second image processing model by taking the first similarity as a pseudo tag of the second image processing model and the second similarity as a pseudo tag of the first image processing model.
2. The method according to claim 1, wherein the feature extraction of the training image by the first image processing model and the second image processing model to obtain the first feature information and the second feature information of the training image includes:
performing enhancement processing on the training image to obtain a first enhancement image and a second enhancement image;
extracting features of the first enhanced image through the first image processing model to obtain first feature information;
And extracting the characteristics of the second enhanced image through the second image processing model to obtain the second characteristic information.
3. The method of claim 2, wherein said determining a first similarity between the first feature information and each of the C classification feature prototypes and a second similarity between the second feature information and each of the C classification feature prototypes comprises:
respectively carrying out de-enhancement processing on the first characteristic information and the second characteristic information to obtain third characteristic information and fourth characteristic information;
determining the similarity between the third characteristic information and each of the C classification characteristic prototypes as the first similarity;
and determining the similarity between the fourth characteristic information and each of the C classification characteristic prototypes as the second similarity.
4. The method of claim 1, wherein said determining C class feature prototypes for the current training phase comprises:
c classification feature prototypes of the previous training stage are obtained;
and updating the C classification feature prototypes in the previous training stage based on the first feature information and the second feature information to obtain the C classification feature prototypes in the current training stage.
5. The method of claim 4, wherein updating the C classification feature prototypes of the previous training stage based on the first feature information and the second feature information to obtain the C classification feature prototypes of the current training stage comprises:
and clustering the first characteristic information, the second characteristic information and the C classification characteristic prototypes in the previous training stage to obtain the C classification characteristic prototypes in the current training stage.
6. The method of any of claims 1-5, wherein training the first image processing model and the second image processing model with the first similarity as a pseudo tag for the second image processing model and the second similarity as a pseudo tag for the first image processing model comprises:
taking the first similarity as a pseudo tag of the second image processing model, taking the second similarity as a pseudo tag of the first image processing model, and determining unsupervised losses of the first image processing model and the second image processing model;
training the first image processing model and the second image processing model based on the unsupervised loss.
7. The method of claim 6, wherein the determining the unsupervised loss of the first image processing model and the second image processing model using the first similarity and the second similarity as pseudo tags for the first image processing model and the second image processing model comprises:
based on the first characteristic information, a first prediction probability value of the training image belonging to C categories is obtained, and based on the second characteristic information, a second prediction probability value of the training image belonging to the C categories is obtained;
determining a first loss between the first predicted probability value and the second similarity using the second similarity as a pseudo tag for the first image processing model;
determining a second loss between the second predicted probability value and the first similarity by taking the first similarity as a pseudo tag of the second image processing model;
the unsupervised loss is determined based on the first loss and the second loss.
8. The method according to claim 7, wherein if the first feature information is obtained based on the first enhanced image enhanced by the training image, and the second feature information is obtained based on the second enhanced image enhanced by the training image, the obtaining, based on the first feature information, a first predicted probability value that the training image belongs to class C, and obtaining, based on the second feature information, a second predicted probability value that the training image belongs to class C, includes:
Respectively carrying out de-enhancement processing on the first characteristic information and the second characteristic information to obtain third characteristic information and fourth characteristic information;
and obtaining the first predicted probability value based on the third characteristic information, and obtaining the second predicted probability value based on the fourth characteristic information.
9. The method of claim 8, wherein the deriving the first predicted probability value based on the third characteristic information and the second predicted probability value based on the fourth characteristic information comprises:
and processing the third characteristic information through a first prediction module to obtain the first prediction probability value, and processing the fourth characteristic information through a second prediction module to obtain the second prediction probability value.
10. The method according to claim 3 or 8, wherein the performing de-enhancement processing on the first feature information and the second feature information to obtain third feature information and fourth feature information includes:
performing de-enhancement processing on the first characteristic information through a first mapper to obtain the third characteristic information;
and performing de-enhancement processing on the second characteristic information through a second mapper to obtain the fourth characteristic information.
11. The method of claim 7, wherein prior to training the first image processing model and the second image processing model based on the unsupervised loss, the method further comprises:
determining a supervised loss based on the first characteristic information, the second characteristic information and a preset training label;
the training the first image processing model and the second image processing model based on the unsupervised loss comprises:
training the first image processing model and the second image processing model based on the unsupervised loss and the supervised loss.
12. The method of claim 11, wherein the first image processing model includes a third prediction module and the second image processing model includes a fourth prediction module, the determining the supervised loss based on the first characteristic information, the second characteristic information, and a preset training label, comprising:
processing the first characteristic information through the third prediction module to obtain a third prediction probability value of the training image belonging to C categories;
processing the second characteristic information through the fourth prediction module to obtain a fourth prediction probability value of the training image belonging to C categories;
And obtaining the supervised loss based on the third predicted probability value, the second predicted probability value and the training label.
13. The method of claim 12, wherein the deriving a supervised penalty based on the third predicted probability value, the fourth predicted probability value, and the training label comprises:
determining a third loss between the third predicted probability value and the training label;
determining a fourth loss between the fourth predicted probability value and the training label;
the supervised loss is determined based on the third loss and the fourth loss.
14. The method of any one of claims 1-5, wherein the first image processing model and the second image processing model are industrial quality inspection models for detecting defect categories of industrial products, and the training image is a defect image of the industrial products.
15. An image processing method, comprising:
acquiring a target image to be processed;
processing the target image through a target model to obtain a processing result of the target image;
the target model is a first image processing model or a second image processing model, the first image processing model and the second image processing model are obtained by taking a first similarity as a pseudo tag of the second image processing model, a second similarity is obtained by taking the first similarity as a pseudo tag of the first image processing model, the first similarity is the similarity between first feature information and C classification feature prototypes in the current training stage, the second similarity is the similarity between second feature information and the C classification feature prototypes, the first feature information and the second feature information are obtained by respectively extracting features of training images based on the first image processing model and the second image processing model, the C classification feature prototypes are obtained by clustering feature information of each training image in a training image set, and the C is a positive integer.
16. The method according to claim 15, wherein the target image is a defect image of an industrial product, the target model is an industrial quality inspection model for detecting a defect type of the industrial product, the processing the target image by the target model to obtain a processing result of the target image includes:
and identifying the defect type of the target image through the target model to obtain the defect type of the target image.
17. An image processing model training apparatus, comprising:
the feature extraction unit is used for extracting features of the training image through the first image processing model and the second image processing model respectively to obtain first feature information and second feature information of the training image;
the prototype determining unit is used for determining C classification feature prototypes in the current training stage, wherein the C classification feature prototypes are obtained by clustering feature information of each training image in a training image set, and C is a positive integer;
a similarity determining unit configured to determine a first similarity between the first feature information and each of the C classification feature prototypes, and a second similarity between the second feature information and each of the C classification feature prototypes;
The training unit is used for taking the first similarity as a pseudo tag of the second image processing model, taking the second similarity as a pseudo tag of the first image processing model, and training the first image processing model and the second image processing model.
18. An image processing apparatus, comprising:
an acquisition unit configured to acquire a target image to be processed;
the processing unit is used for processing the target image through a target model to obtain a processing result of the target image;
the target model is a first image processing model or a second image processing model, the first image processing model and the second image processing model are obtained by taking a first similarity as a pseudo tag of the second image processing model, a second similarity is obtained by taking the first similarity as a pseudo tag of the first image processing model, the first similarity is the similarity between first feature information and C classification feature prototypes in the current training stage, the second similarity is the similarity between second feature information and the C classification feature prototypes, the first feature information and the second feature information are obtained by respectively extracting features of training images based on the first image processing model and the second image processing model, the C classification feature prototypes are obtained by clustering feature information of each training image in a training image set, and the C is a positive integer.
19. A computer device comprising a processor and a memory;
the memory is used for storing a computer program;
the processor for executing the computer program to implement the method of any of the preceding claims 1 to 14 or 15-16.
20. A computer-readable storage medium storing a computer program;
the computer program causes a computer to perform the method of any of the preceding claims 1 to 14 or 15-16.
CN202310545122.8A 2023-05-15 2023-05-15 Image processing and model training method, device, equipment and storage medium Pending CN116977775A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310545122.8A CN116977775A (en) 2023-05-15 2023-05-15 Image processing and model training method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310545122.8A CN116977775A (en) 2023-05-15 2023-05-15 Image processing and model training method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116977775A true CN116977775A (en) 2023-10-31

Family

ID=88477449

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310545122.8A Pending CN116977775A (en) 2023-05-15 2023-05-15 Image processing and model training method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116977775A (en)

Similar Documents

Publication Publication Date Title
CN110852447B (en) Meta learning method and apparatus, initializing method, computing device, and storage medium
CN107330731B (en) Method and device for identifying click abnormity of advertisement space
CN114241282A (en) Knowledge distillation-based edge equipment scene identification method and device
CN113159283B (en) Model training method based on federal transfer learning and computing node
WO2019232772A1 (en) Systems and methods for content identification
CN113095370A (en) Image recognition method and device, electronic equipment and storage medium
CN110414581B (en) Picture detection method and device, storage medium and electronic device
WO2022111387A1 (en) Data processing method and related apparatus
CN113628059A (en) Associated user identification method and device based on multilayer graph attention network
CN114419363A (en) Target classification model training method and device based on label-free sample data
CN114492601A (en) Resource classification model training method and device, electronic equipment and storage medium
CN114580794B (en) Data processing method, apparatus, program product, computer device and medium
CN113987236B (en) Unsupervised training method and unsupervised training device for visual retrieval model based on graph convolution network
CN117726884B (en) Training method of object class identification model, object class identification method and device
CN113688814B (en) Image recognition method and device
CN115438755B (en) Incremental training method and device for classification model and computer equipment
CN117036843A (en) Target detection model training method, target detection method and device
CN116958729A (en) Training of object classification model, object classification method, device and storage medium
CN115131600A (en) Detection model training method, detection method, device, equipment and storage medium
CN116977775A (en) Image processing and model training method, device, equipment and storage medium
CN114663751A (en) Power transmission line defect identification method and system based on incremental learning technology
CN111091198B (en) Data processing method and device
CN113313079B (en) Training method and system of vehicle attribute recognition model and related equipment
CN117216363A (en) Media resource recommendation method, device, equipment and storage medium
CN117011579A (en) Training method and device of image recognition network, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination