WO2023273668A1 - 图像分类方法、装置、设备、存储介质及程序产品 - Google Patents

图像分类方法、装置、设备、存储介质及程序产品 Download PDF

Info

Publication number
WO2023273668A1
WO2023273668A1 PCT/CN2022/093376 CN2022093376W WO2023273668A1 WO 2023273668 A1 WO2023273668 A1 WO 2023273668A1 CN 2022093376 W CN2022093376 W CN 2022093376W WO 2023273668 A1 WO2023273668 A1 WO 2023273668A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
feature set
feature
initial
features
Prior art date
Application number
PCT/CN2022/093376
Other languages
English (en)
French (fr)
Inventor
李悦翔
何楠君
马锴
郑冶枫
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP22831494.4A priority Critical patent/EP4235488A4/en
Priority to US18/072,337 priority patent/US20230092619A1/en
Publication of WO2023273668A1 publication Critical patent/WO2023273668A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Definitions

  • the embodiments of the present application relate to the field of artificial intelligence, and in particular to an image classification method, device, equipment, storage medium, and program product.
  • Image classification refers to the process of distinguishing different categories of images according to their semantic information.
  • the Vision Transformer (ViT) model is used to classify the input image.
  • ViT Vision Transformer
  • a large number of labeled sample images are input, and then the model is trained based on the difference between the classification result predicted by the model and the label, and the ViT is realized. Accurate classification of images by the model.
  • Embodiments of the present application provide an image classification method, device, device, storage medium, and program product, which can reduce the need for labeled sample images in the training process of the image classification model, and help to improve the accuracy of the prediction results of the image classification model. accuracy.
  • the technical solution includes the following contents.
  • an embodiment of the present application provides an image classification method, the method is executed by a computer device, and the method includes the following content:
  • the first image feature in the first image feature set and the second image feature set are The second image feature in the image feature set corresponds to different rearrangement and combination methods
  • the image classification model is used to classify content in the image
  • the pre-trained image classification model is fine-tuned based on a second sample image, where the second sample image is an annotated sample image.
  • the embodiment of the present application provides an image classification device, the device includes the following modules:
  • the image segmentation module is used to perform image segmentation on the first sample image, and perform feature extraction on each image block obtained by segmentation to obtain an initial image feature set, and the initial image feature set includes initial image features corresponding to each image block , the first sample image is an unlabeled sample image;
  • a rearrangement and combination module configured to rearrange and combine the initial image features in the initial image feature set to obtain a first image feature set and a second image feature set, and the first image feature set in the first image feature set
  • the image features and the second image features in the second image feature set correspond to different rearrangement and combination methods
  • a pre-training module configured to pre-train an image classification model based on the first image feature set and the second image feature set, and the image classification model is used to classify content in the image;
  • a fine-tuning module configured to fine-tune the pretrained image classification model based on a second sample image, where the second sample image is an annotated sample image.
  • an embodiment of the present application provides a computer device, the computer device includes a processor and a memory, and the memory stores at least one instruction, at least one program, a code set or an instruction set, and the at least one instruction , the at least one program, the code set or instruction set is loaded and executed by the processor to implement the image classification method as described in the above aspect.
  • a computer-readable storage medium wherein at least one instruction, at least one program, code set or instruction set is stored in the readable storage medium, the at least one instruction, the at least one program, the The above code set or instruction set is loaded and executed by the processor to implement the image classification method as described above.
  • an embodiment of the present application provides a computer program product or computer program, where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the image classification method provided by the above aspects.
  • a computer program product or computer program comprising computer instructions stored in a computer readable storage medium.
  • the processor of the computer device reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the computer device executes the image classification method provided in various optional implementation manners of the above aspects.
  • the order of the initial image features is disrupted in different ways and rearranged to obtain the first image feature set and the second image feature set, and then the image can be processed based on the image feature sets in different rearrangement and combination methods.
  • the self-supervised pre-training of the classification model does not require pre-training with labeled sample images, which reduces the demand for labeled sample images and reduces the amount of manual labeling tasks. Fine-tuning the image classification model to ensure the classification performance of the final image classification model, which helps to improve the accuracy of image classification.
  • FIG. 1 shows a schematic diagram of the principle of image classification model training provided by the embodiment of the present application
  • FIG. 2 shows a schematic diagram of an implementation environment provided by an exemplary embodiment of the present application
  • FIG. 3 shows a flowchart of an image classification method provided by an exemplary embodiment of the present application
  • FIG. 4 shows a flowchart of an image classification method provided by another exemplary embodiment of the present application.
  • FIG. 5 shows a schematic structural diagram of a ViT model provided by an exemplary embodiment of the present application
  • Fig. 6 is an implementation schematic diagram of a rearrangement and combination process shown in an exemplary embodiment
  • FIG. 7 shows a schematic diagram of the implementation of image classification model pre-training provided by an exemplary embodiment of the present application
  • FIG. 8 shows a flowchart of an image classification method provided by another exemplary embodiment of the present application.
  • FIG. 9 shows a flowchart of an image classification method provided by another exemplary embodiment of the present application.
  • FIG. 10 shows a schematic diagram of the implementation of image classification model pre-training provided by another exemplary embodiment of the present application.
  • Fig. 11 is a structural block diagram of an image classification device provided by an exemplary embodiment of the present application.
  • Fig. 12 shows a schematic structural diagram of a computer device provided by an exemplary embodiment of the present application.
  • Artificial Intelligence is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • artificial intelligence is a comprehensive technique of computer science that attempts to understand the nature of intelligence and produce a new kind of intelligent machine that can respond in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Artificial intelligence technology is a comprehensive subject that involves a wide range of fields, including both hardware-level technology and software-level technology.
  • Artificial intelligence basic technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics.
  • Artificial intelligence software technology mainly includes several major directions such as computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
  • Computer Vision technology (Computer Vision, CV) is a science that studies how to make machines "see”. More specifically, it refers to machine vision that uses cameras and computers instead of human eyes to identify, track and measure targets, and further Do graphics processing, so that computer processing becomes an image that is more suitable for human observation or sent to the instrument for detection.
  • Computer vision technology usually includes image processing, image recognition, image segmentation, image semantic understanding, image retrieval, video processing, video semantic understanding, video content/behavior recognition, 3D object reconstruction, 3D technology, virtual reality, augmented reality, simultaneous positioning and Map construction and other technologies also include common biometric identification technologies such as face recognition and fingerprint recognition.
  • the image classification method involved in the embodiment of the present application can reduce the need for labeled sample images during the training process of the image classification model, and help to improve the prediction results of the trained image classification model accuracy.
  • the model pre-training system includes a first rearrangement combination module 102 and a second rearrangement combination module 103, and the first sample image 101 without sample labels is input to the first rearrangement combination module 102 and the second rearrangement combination module 102 respectively.
  • the first image feature set 104 and the second image feature set 105 are obtained, and then the image classification model 106 is pre-trained based on the first image feature set 104 and the second image feature set 105 .
  • the second sample image 107 carrying the sample label is input to the pre-trained image classification model 106, and parameters are fine-tuned to obtain a final image classification model for image classification.
  • Fig. 2 shows a schematic diagram of an implementation environment provided by an exemplary embodiment of the present application.
  • the implementation environment includes computer equipment 210 and server 220 .
  • data communication is performed between the computer device 210 and the server 220 through a communication network.
  • the communication network may be a wired network or a wireless network, and the communication network may be at least one of a local area network, a metropolitan area network, and a wide area network. kind.
  • the computer device 210 is an electronic device that requires image classification, and the electronic device may be a smart phone, a tablet computer, or a personal computer, etc., which is not limited in this embodiment.
  • computer device 210 runs an application program with image classification functionality.
  • the application program can be a social application program, an image retrieval application program, and an image storage application program.
  • the computer device 210 can input the target image collection or target image into the application program, thereby
  • the set or target images are uploaded to the server 220, and the server 220 identifies the image category and feeds back the classification results.
  • the server 220 can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, and can also provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, Cloud servers for basic cloud computing services such as middleware services, domain name services, security services, content delivery network (Content Delivery Network, CDN), and big data and artificial intelligence platforms.
  • cloud services such as middleware services, domain name services, security services, content delivery network (Content Delivery Network, CDN), and big data and artificial intelligence platforms.
  • the server 220 is used to provide image classification services for application programs installed in the computer device 210 .
  • an image classification model is set in the server 220, and the image classification model is an image classification model pre-trained by the unlabeled first sample image and fine-tuned by the labeled second sample image, which is used for computer The images sent by the device 210 are classified.
  • the image classification model can also be deployed on the computer device 210 side, and the computer device 210 implements image classification locally without the help of the server 220.
  • the image classification model is trained on the computer device 210 side.
  • the image classification model is trained on the server 220 side, and the computer device 210 deploys the trained image classification model.
  • This embodiment does not limit it.
  • the following embodiments are described by taking the image classification method executed by a computer device as an example.
  • the server deployed with the neural network model can be a node in a distributed system, wherein the distributed system can be a block chain system, and the block chain system can be composed of the multiple A distributed system formed by connecting nodes through network communication.
  • nodes can form a peer-to-peer (Peer To Peer, P2P) network, and any form of computing equipment, such as servers, terminals and other electronic devices, can become a node in the blockchain system by joining the peer-to-peer network.
  • the node includes a hardware layer, an intermediate layer, an operating system layer and an application layer.
  • the training samples of the image classification model can be saved on the blockchain.
  • FIG. 3 shows a flowchart of an image classification method provided by an exemplary embodiment of the present application. This embodiment is described by taking the method executed by a computer device as an example, and the method includes the following steps.
  • Step 301 perform image segmentation on the first sample image, and perform feature extraction on each of the segmented image blocks to obtain an initial image feature set, the initial image feature set includes initial image features corresponding to each image block, the first sample
  • the images are unlabeled sample images.
  • the image classification model can be applied to any scene that identifies the category of image content. Therefore, the first sample image can be an image of any category, such as medical images, animal images, landscape images, and the like. And the first sample image is an image collection of sample images that have not been labeled.
  • the computer device after acquiring the first sample image, the computer device first performs image segmentation on the image.
  • the first sample image may be divided into image blocks of the same size, and different image blocks carry different image information.
  • the computer device After the segmentation is completed, the computer device performs feature extraction on each segmented image block. Optionally, the computer device performs linear mapping on each image block to obtain initial image features corresponding to each image block, and combines the initial image feature set.
  • Step 302 rearrange and combine the initial image features in the initial image feature set to obtain the first image feature set and the second image feature set, the first image feature in the first image feature set and the second image feature set
  • the second image features correspond to different rearrangement and combination modes.
  • the computer device rearranges and combines each initial image feature in the initial image feature set in different ways to obtain the first image feature set and the second image feature set .
  • the image information indicated by each first image feature included in the first image feature set is different from that indicated by each second image feature included in the second image feature set. That is, the computer device obtains different combinations of image features in the first sample image through different rearrangement and combination methods.
  • the number of first image features obtained is the same as the number of initial image features in the initial image feature set, and correspondingly, the number of second image features is the same as the number of initial image features.
  • Step 303 pre-training an image classification model based on the first image feature set and the second image feature set, and the image classification model is used to classify content in the image.
  • Pre-training refers to a process of training the image classification model by using a large data set so that the image classification model can learn the common features in the data set.
  • the purpose of pre-training is to provide high-quality model parameters for the subsequent image classification model training on a specific data set.
  • the first image feature in the first image feature set and the second image feature in the second image feature set are image features at different positions of the first sample image, although the features are different, they belong to the image features of the same image,
  • the corresponding image classification results of the two should be consistent. Therefore, after the computer equipment inputs each of the first image features and each of the second image features into the image classification model, based on the principle that the prediction results of the models are consistent, the self-supervised training of the image classification model can be realized according to the obtained classification results, without using Annotated sample images, that is, labeled sample images.
  • the image classification model is used to classify the content in the image.
  • the image classification model can identify the category of a single image, and can also distinguish the categories of each image in the image collection to complete the classification of the image collection.
  • Step 304 fine-tuning the pre-trained image classification model based on the second sample image, where the second sample image is an annotated sample image.
  • Fine-tuning is a process of fine-tuning the model parameters in a small range through a small amount of data sets, and the fine-tuning stage adopts a supervised learning method. Therefore, the pre-trained image classification model is fine-tuned using the labeled second sample image.
  • the pre-trained model already has high-quality model parameters, only a small number of labeled samples are needed in the fine-tuning stage to make the model have high performance on the target task, and the data set used for fine-tuning can be smaller than that used for pre-training.
  • the amount of data in the training dataset. Therefore, the number of the second sample images is less than that of the first sample images, which can reduce the demand for labeled sample images.
  • the order of the initial image features is disturbed in different ways and rearranged and combined to obtain the first image feature set and the second image feature set, which can then be based on different rearrangement and combination methods.
  • the image feature set performs self-supervised pre-training on the image classification model, without the need for pre-training with labeled sample images, reducing the demand for labeled sample images, reducing the amount of manual labeling tasks, and passing the labeled samples after pre-training
  • the image is fine-tuned on the pre-trained image classification model to ensure the classification performance of the final image classification model, which helps to improve the accuracy of image classification.
  • the online learning branch and the target learning branch are used to classify the image feature sets obtained by different rearrangement combinations, and then realize the image classification based on the classification results of the two branches
  • the pre-training process of the model will be described below with an exemplary embodiment.
  • FIG. 4 shows a flowchart of an image classification method provided by another exemplary embodiment of the present application. This embodiment is described by taking the method executed by a computer device as an example, and the method includes the following steps.
  • Step 401 image segmentation is performed on the first sample image, and feature extraction is performed on each of the segmented image blocks to obtain an initial image feature set, which includes initial image features corresponding to each image block, and the first sample
  • the images are unlabeled sample images.
  • the image classification model in the embodiment of the present application may be a ViT model
  • the ViT model is an image classification model obtained by combining CV and natural language processing (Natural Language Processing, NLP) fields.
  • the computer device When using the ViT model to classify the first sample image, the computer device first divides the first sample image into image blocks of a fixed size, and then transforms each image block into an original image feature through linear transformation, that is, encoding each image block It is a token with sequence information.
  • the first sample image is first divided into image blocks 501, and then each image block is linearly transformed to obtain a token 502 corresponding to each image block, and an initial image feature set is obtained.
  • Step 402 adjusting the feature order of the initial image features in the initial image feature set to obtain the first initial image feature set and the second initial image feature set, and the order of the initial image features in the first initial image feature set and the second initial image feature set different.
  • the order of the initial image features is firstly adjusted, that is, the position information of each initial image feature is disrupted.
  • the computer device randomly shuffles the sequence of the initial image features to obtain the first initial image feature set and the second initial image feature set.
  • the computer device may also change the arrangement order of the initial image features in two fixed order adjustment manners to obtain the first initial image feature set and the second initial image feature set. This embodiment of the present application does not limit it.
  • the order of each initial image feature can be adjusted to be different from the initial order, or some initial image features can be selected, and only the feature order of some initial image features can be adjusted.
  • the initial image features are randomly shuffled
  • the first initial image feature set and the second initial image feature set are obtained, and the order of the initial image features is different, that is, the first image feature set and the second image feature set are different.
  • Subsequent feature recombination methods may be the same or different.
  • the initial image feature set T ⁇ t 1 ,...,t 9 ⁇ contains 9 initial image features, that is, 9 tokens, which have corresponding order information.
  • T p2 ⁇ t 2 ,t 7, t 3 ,t 1 ,t 4 ,t 9 ,t 8 ,t 5 , t 6 ⁇ .
  • Step 403 rearranging based on the first initial image feature set to obtain a first feature matrix, and rearranging based on the second initial image feature set to obtain a second feature matrix.
  • a feature matrix about the scrambled initial image feature set may first be constructed, including building a first feature matrix of the first initial image feature set and a second feature matrix of the second initial image feature set.
  • constructing the first feature matrix and the second feature matrix may include the following steps.
  • Step 403a based on the image segmentation method of the first sample image, determine the matrix size.
  • the size of the construction matrix may be determined according to the image segmentation method of the first sample image, so as to avoid a mismatch between the size of the construction matrix and the number of image blocks obtained by segmentation. If the first sample image is segmented to obtain 9 image blocks, a 3 ⁇ 3 matrix can be constructed; if the first sample image is segmented to obtain 16 image blocks, a 4 ⁇ 4 matrix or 2 ⁇ 8 matrix can be constructed.
  • the size of the matrix can be determined to be 3 ⁇ 3, and the number of image blocks obtained by segmentation match.
  • Step 403b Based on the size of the matrix, rearrange the initial image features in the first initial image feature set to obtain a first feature matrix.
  • the first feature matrix is constructed according to the size of the matrix, that is, the initial image features in the first initial image feature set are rearranged.
  • the initial image features can be selected sequentially, arranged in rows, or arranged in columns, to complete the construction of the first feature matrix.
  • Step 403c based on the size of the matrix, rearrange the initial image features in the second initial image feature set to obtain a second feature matrix.
  • the computer device also constructs the second feature matrix according to the size of the matrix, that is, rearranges the initial image features in the second initial image feature set.
  • the manner of constructing the second characteristic matrix may be the same as that of the first characteristic matrix, or may be different, which is not limited in this embodiment.
  • the first feature matrix is arranged in rows, and the second feature matrix is arranged in columns; or, the first feature matrix is arranged in columns, and the second feature matrix is arranged in rows; or, the first feature matrix and
  • the second characteristic matrix adopts a row-by-row arrangement method and the like.
  • the second initial image feature set T p2 ⁇ t 2 , t 7 , t 3 , t 1 , t 4 , t 9 , t 8 , t 5 , t 6 ⁇ is used to construct a 3 ⁇ 3 matrix by column,
  • the second characteristic matrix is obtained as follows:
  • this step and the above step 403b that is, the step of constructing the first feature matrix can be executed synchronously or asynchronously.
  • This embodiment only describes the construction methods of the first feature matrix and the second feature matrix, but the The execution timing is not limited.
  • Step 404 perform feature combination on the initial image features in the first feature matrix, and generate a first image feature set based on the result of feature combination.
  • the computer device After the rearrangement is completed, the computer device performs feature combination on the initial image features in the first feature matrix, and generates a first image feature set according to the combination result.
  • the image information in the image block corresponding to each first image feature in the first image feature set changes, that is, it is different from the image information in the image block corresponding to each initial image feature.
  • the process of combining features and generating the first image feature set based on the combination result may include the following steps.
  • Step 404a select n adjacent initial image features in the first feature matrix through a sliding window.
  • the computer device selects n initial image features for feature combination by means of sliding window sampling, wherein the size of the sliding window needs to be smaller than the size of the matrix.
  • a 2 ⁇ 2 sliding window can be used for a 3 ⁇ 3 matrix
  • a 2 ⁇ 2 sliding window or a 3 ⁇ 3 sliding window can be used for a 4 ⁇ 4 matrix.
  • a 2 ⁇ 2 sliding window may be used to sample a 3 ⁇ 3 first feature matrix, as shown in FIG. 6 , four initial image features may be selected through a sliding window 601 .
  • Step 404b performing feature combination on the n initial image features to obtain a first combined image feature.
  • the feature combination method may include feature stitching, feature fusion, etc., that is, feature stitching is performed on n initial image features to obtain the first combined image feature, or feature fusion is performed on n initial image features That is, the features are added to obtain the first combined image features.
  • the computer device performs feature splicing on the four initial image features t 3 , t 5 , t 1 , and t 6 to obtain a first combined image feature 602 .
  • Step 404c perform linear mapping on the m groups of first combined image features to obtain a first image feature set, and the m groups of first combined image features are obtained by moving the sliding window.
  • the computer device traverses the first feature matrix through a sliding window to obtain m groups of first combined image features, where m is a positive integer.
  • the sliding step length and the sliding direction of the sliding window may be set randomly or fixedly. For example, you can set the sliding step to 1, and slide according to the direction of the row.
  • the computer device After obtaining m sets of first combined image features, the computer device performs linear mapping on m sets of first combined image features to obtain a first set of image features.
  • the m sets of first combined image features may be output to a multilayer perceptron (Multilayer Perceptron, MLP) for linear mapping to obtain the first set of image features.
  • MLP Multilayer Perceptron
  • the number of first image features in the mapped first image feature set is the same as the number of initial image features.
  • each group contains 4 initial image features.
  • Step 405 perform feature combination on the initial image features in the second feature matrix, and generate a second image feature set based on the feature combination result.
  • the computer device performs feature combination on the initial image features in the second feature matrix, and generates a second image feature set according to the combination result.
  • the image information in the image block corresponding to each second image feature in the second image feature set changes, that is, it is different from the image information in the image block corresponding to each initial image feature.
  • the image information of the image block corresponding to the first image feature is different.
  • combining features and generating a second image feature set based on the combination result may include the following steps.
  • Step 405a select n adjacent initial image features in the second feature matrix through a sliding window.
  • the computer device when constructing the second feature matrix, the computer device also selects n initial image features for feature combination by means of sliding window sampling, wherein the size of the sliding window needs to be smaller than the size of the matrix.
  • the size of the sliding window for sampling the second feature matrix may be the same as or different from the size of the sliding window for sampling the first feature matrix.
  • a 2 ⁇ 2 sliding window can be used for sampling to obtain 4 sets of first combined image features that do not overlap with each other.
  • a 3 ⁇ 3 sliding window is used for sampling to obtain 4 groups of second combined image features that overlap with each other.
  • Step 405b performing feature combination on the n initial image features to obtain a second combined image feature.
  • the feature combination method may include feature stitching, feature fusion, etc., that is, feature stitching is performed on n initial image features to obtain the second combined image feature, or feature fusion is performed on n initial image features That is, the features are added to obtain the second combined image features.
  • Step 405c perform linear mapping on m groups of second combined image features to obtain a second image feature set, and m groups of second combined image features are obtained by moving the sliding window.
  • the computer device traverses the second feature matrix through the sliding window to obtain m groups of second combined image features.
  • the sliding step size and the sliding direction of the sliding window can be set randomly. For example, you can set the sliding step to 1, and slide according to the direction of the column.
  • linear mapping is performed on the m sets of second combined image features to obtain a second set of image features.
  • m groups of second combined image features may be output to an MLP for linear mapping to obtain a second set of image features.
  • the number of second image features in the second image feature set obtained through mapping is the same as the number of initial image features.
  • Step 406 Input the first image feature set into the online learning branch of the image classification model to obtain the first classification result.
  • the computer device can use the first image feature set and the second image feature set to pre-train the image classification model.
  • the image classification model includes an online learning branch and a target learning branch, wherein the structure of the online learning branch is the same as that of the image classification model in the target learning branch, both of which are structures corresponding to the ViT model, but the corresponding model parameters
  • the update method is different.
  • the computer device inputs the first image feature set into the online learning branch of the image classification model, and the online learning branch is used to identify the first sample image according to the image features indicated by the first image feature set The image category, get the first classification result.
  • Its ViT model is shown in Figure 5.
  • the first image feature set is input into the Transformer encoder, image feature extraction is performed on the first image feature set, and the extraction result is input into the classifier MLP Head for image classification, and the first image feature set is obtained.
  • a classification result is shown in Figure 5.
  • the first sample image 701 is input into the first rearrangement and combination module 702 to obtain the first image feature set, and the first image feature set is input into the ViT model to obtain the first classification result Z, this branch is the online learning branch.
  • Step 407 inputting the second image feature set into the target learning branch of the image classification model to obtain a second classification result.
  • the second image feature set is input into the target learning branch, and the target learning branch is used to identify the image category of the second sample image according to the image features indicated by the second image feature set, that is, to obtain a second classification result.
  • the second image feature set is input into the encoder, the image features are extracted from the second image feature set, and the extraction result is also input into the classifier MLP Head for image classification, and the obtained Second classification results.
  • the first sample image 701 is input into the second rearrangement and combination module 703 to obtain the second image feature set
  • the second image feature set is input into the ViT model to obtain the second classification
  • the result Z', this branch is the target learning branch.
  • the rearrangement and combination module 703 and the rearrangement combination module 702 respectively correspond to different rearrangement and combination methods.
  • Step 408 train the online learning branch based on the first classification result and the second classification result.
  • the computer device first trains the online learning branch based on the first classification result and the second classification result, and then updates the model parameters of the target learning branch based on the updated online learning branch.
  • the training process of the online learning branch may include the following steps.
  • Step 408a Determine the similarity loss between the first classification result and the second classification result.
  • the computer equipment determines the similarity loss between the first classification result and the second classification result, and then trains the ViT model based on the similarity loss, so that it can According to the image features in different combinations and arrangements, the same classification results are obtained, thereby improving the accuracy of the ViT model for image classification.
  • the model parameters of the ViT model can be updated without using the labeled sample images, and the self-supervised learning of the ViT model can be realized.
  • the similarity loss means the degree of consistency between the first classification result and the second classification result
  • the L1 loss function or the L2 loss function can be used to determine the similarity between the first classification result and the second classification result loss.
  • the similarity loss can be:
  • L represents the similarity loss
  • Z represents the first classification result
  • Z' represents the second classification result
  • Step 408b based on the similarity loss, update the model parameters of the online learning branch through backpropagation.
  • the update method of the model parameters in the online learning branch is different from the update method of the model parameters in the target learning branch.
  • the online learning branch is updated by backpropagation, and the model parameters in the target learning branch are updated according to the model parameters in the online learning branch. Then, through iterative training, optimize the model parameters of the online learning branch and the image classification model in the target learning branch, that is, optimize the model parameters of the ViT model.
  • the model parameters of the online learning branch may be updated based on the similarity loss backpropagation until the model parameters meet the training conditions, that is, the similarity loss reaches the convergence condition.
  • Step 409 based on the model parameters of the online learning branch after training, update the model parameters of the target learning branch.
  • the computer device will update the model parameters of the target learning branch accordingly.
  • the model parameters of the target learning branch will be updated again.
  • both the online learning branch and the target learning branch stop updating the model parameters.
  • the model parameters of the target learning branch can be updated by Exponential Moving Average (EMA), and the update method is as follows:
  • is the model parameter of the image classification model in the target learning branch
  • is the model parameter of the image classification model in the online learning branch
  • is the weight parameter to balance the two model parameters.
  • Step 410 fine-tuning the model parameters of the target learning branch in the image classification model based on the second sample image.
  • model parameters of the target learning branch may be fine-tuned, and the fine-tuning process may include the following steps.
  • Step 410a inputting the second sample image into the target learning branch of the image classification model to obtain a sample classification result.
  • Each marked second sample image is input into the ViT model of the target learning branch, and a sample classification result corresponding to each second sample image is obtained.
  • Step 410b based on the sample classification result and the sample classification label corresponding to the second sample image, fine-tune the model parameters of the target learning branch through backpropagation.
  • the model parameters can be fine-tuned through back propagation to obtain the final image classification model.
  • the loss can be determined based on the sample classification results and the labeled sample classification labels, and the model parameters can be reversely fine-tuned based on the loss to obtain optimized model parameters.
  • image classification is performed on the first image feature set and the second image feature set in different rearrangement and combination methods, and the image classification model is pre-trained based on the obtained first classification result and second classification result, which can be Improve the accuracy of the output classification results when the image classification model classifies and predicts different combinations of image features of the same sample.
  • the computer device may also obtain the first image feature set and the second image feature set by setting different feature rearrangement manners or feature combination manners.
  • FIG. 8 shows a flowchart of an image classification method provided by another embodiment of the present application. This embodiment is described by taking the method executed by a computer device as an example, and the method includes the following steps.
  • Step 801 perform image segmentation on the first sample image, and perform feature extraction on each of the segmented image blocks to obtain an initial image feature set, which includes initial image features corresponding to each image block, the first sample
  • the images are unlabeled sample images.
  • step 801 For the specific implementation manner of step 801, reference may be made to the foregoing step 401, which will not be repeated in this embodiment of the present application.
  • Step 802 adjusting the feature order of the initial image features in the initial image feature set to obtain a first initial image feature set and a second initial image feature set.
  • the order of the initial image features is firstly adjusted, that is, the position information of each initial image feature is disrupted.
  • the computer device can adjust the feature order of the initial image features in the initial image feature set in the same way to obtain the first initial image feature set and the second initial image feature set, or it can also adjust according to different
  • the shuffle mode is used to adjust the feature order of the initial image features in the initial image feature set to obtain the first initial image feature set and the second initial image feature set. This embodiment of the present application does not limit it.
  • Step 803 rearranging based on the first initial image feature set to obtain a first feature matrix, and rearranging based on the second initial image feature set to obtain a second feature matrix.
  • the computer device may rearrange the first initial image feature set and the second initial image feature set in the same rearrangement manner, or may rearrange the first initial image feature set and the second initial image feature set in a different rearrangement manner.
  • the image feature set is rearranged.
  • the subsequent feature combination process needs to ensure that the features corresponding to the first image feature set and the second image feature set The combinations are different.
  • Step 804 perform feature combination on the initial image features in the first feature matrix, and generate a first image feature set based on the feature combination result.
  • Step 805 perform feature combination on the initial image features in the second feature matrix, and generate a second image feature set based on the feature combination result; the first feature matrix and the second feature matrix have different rearrangement methods of the initial image features, and/or , the feature combinations of the initial image features in the first image feature set and the second image feature set are different.
  • the computer device After rearranging, the computer device performs feature combination on the initial image features in the first feature matrix, generates the first image feature set according to the combination result, and performs feature combination on the initial image features in the second feature matrix, and generates A second set of image features.
  • the computer device rearranges the first initial image feature set and the second initial image feature set in the same rearrangement manner, the first image feature set and the second image feature set
  • the feature combinations of the initial image features need to be different, so as to ensure that the image features corresponding to the first image feature set and the second image feature set are different.
  • Step 806 pre-training an image classification model based on the first image feature set and the second image feature set, and the image classification model is used to classify content in the image.
  • Step 807 fine-tuning the pre-trained image classification model based on the second sample image, where the second sample image is an annotated sample image.
  • the embodiment of the present application describes another way of generating the first image feature set and the second image feature set.
  • the specific model pre-training and fine-tuning process can refer to the corresponding embodiment in Figure 4, and the embodiment of the present application does not describe here Let me repeat.
  • the complexity of the image features in the image feature set obtained after rearrangement and combination can be improved through multiple rearrangement combinations, and then through The complex image feature set pre-trains the image classification model, which will be described in an exemplary embodiment below.
  • FIG. 9 shows a flowchart of an image classification method provided by another exemplary embodiment of the present application. This embodiment is described by taking the method applied to computer equipment as an example, and the method includes the following steps.
  • Step 901 perform image segmentation on the first sample image, and perform feature extraction on each segmented image block to obtain an initial image feature set.
  • Step 902 rearrange and combine the initial image features in the initial image feature set to obtain a first image feature set and a second image feature set.
  • steps 901 to 902 For the implementation manners of steps 901 to 902, reference may be made to the above steps 401 to 405, which will not be repeated in this embodiment.
  • Step 903 Based on the first image feature set, iteratively perform at least one rearrangement and combination to obtain a third image feature set.
  • the image classification model after obtaining the first image feature set, continue to rearrange and combine the first image features in the first image feature set to obtain new image features collection, and continue to rearrange and combine the image features in the new image feature set, that is, iteratively perform at least one rearrangement and combination, and obtain the third image feature set after iterative rearrangement and combination.
  • the number of iterations can be set according to the classification performance requirements of the image classification model, and the number of iterations is positively correlated with the classification performance of the image classification model.
  • the method of rearranging and combining iteratively can refer to the above-mentioned method of rearranging and combining the initial image features in the initial image feature set, that is, the method of shuffling, rearranging, combining and finally performing linear mapping on the first image features process.
  • the same rearrangement and combination method may be used, or different rearrangement and combination methods may be used, which is not limited in this embodiment.
  • Step 904 Based on the second image feature set, iteratively perform at least one rearrangement and combination to obtain a fourth image feature set.
  • At least one iterative rearrangement and combination may also be performed on the second image feature set to obtain a fourth image feature set.
  • the rearrangement and combination method includes the process of shuffling, rearranging, combining and finally performing linear mapping on the features of the second image.
  • the same rearrangement combination or different rearrangement combinations can also be used.
  • the number of iterations for iteratively rearranging and combining the second image feature set is the same as, or different from, the number of iterations for iteratively rearranging and combining the first image feature set.
  • at least one rearrangement and combination may be iteratively performed only based on the first image feature set or at least one rearrangement and combination may be iteratively performed only based on the second image feature set.
  • Step 905 pre-training the image classification model based on the third image feature set and the fourth image feature set.
  • the steps of pre-training the image classification model based on the third image feature set and the fourth image feature set can refer to the steps of pre-training the image classification model based on the first image feature set and the second image feature set in the above embodiment, this The embodiment will not be described in detail.
  • Step 906 fine-tuning the pre-trained image classification model based on the second sample image, where the second sample image is an annotated sample image.
  • the robustness of the image classification model is improved by iteratively rearranging and combining the first image feature set and the second image feature set.
  • the learning branch of the image classification model may be continuously added, so as to pre-train the image classification model based on classification results of multiple branches.
  • the model parameters of the online learning branch may be updated based on the similarity loss between pairs of classification results through backpropagation.
  • the first sample image 1001 is input into the first rearrangement and combination module 1002, the second rearrangement and combination module 1003, and the third rearrangement and combination module 1004 respectively, to obtain images in different rearrangement and combination modes feature set, and respectively input the image feature set into the ViT model for image classification to obtain the first classification result Z, the second classification result Z' and the third classification result Z", and then based on the first classification result Z and the third classification result
  • the first similarity loss L1 is determined based on the two classification results Z'
  • the second similarity loss L2 is determined based on the first classification result Z and the third classification result Z
  • the second similarity loss L2 is determined based on the second classification result Z' and the third classification result Z".
  • the model parameters of model 1007 are updated based on the model parameters of ViT model 1005 .
  • Fig. 11 is a structural block diagram of an image classification device provided by an exemplary embodiment of the present application. As shown in the figure, the device includes the following modules:
  • the image segmentation module 1101 is configured to perform image segmentation on the first sample image, and perform feature extraction on each of the segmented image blocks to obtain an initial image feature set, which includes an initial image corresponding to each image block feature, the first sample image is an unlabeled sample image;
  • the rearrangement and combination module 1102 is configured to rearrange and combine the initial image features in the initial image feature set to obtain a first image feature set and a second image feature set, and the first image feature set in the first image feature set An image feature and the second image feature in the second image feature set correspond to different rearrangement and combination methods;
  • a pre-training module 1103, configured to pre-train an image classification model based on the first image feature set and the second image feature set, and the image classification model is used to classify content in an image;
  • the fine-tuning module 1104 is configured to fine-tune the pre-trained image classification model based on a second sample image, where the second sample image is an annotated sample image.
  • the first rearrangement and combination module 1102 is also used for:
  • the rearrangement and combination module 1102 is also used for:
  • the rearrangement and combination module 1102 is also used for:
  • the rearrangement and combination module 1102 is also used for:
  • the rearrangement and combination module 1102 is also used for:
  • the rearrangement and combination module 1102 is also used for:
  • the initial image features in the second initial image feature set are rearranged to obtain the second feature matrix.
  • the rearrangement and combination module 1102 is also used for:
  • the pre-training module 1103 is further configured to pre-train an image classification model based on the third image feature set and the fourth image feature set.
  • the rearrangement and combination module 1102 is also used for:
  • the rearrangement of the initial image features in the first feature matrix and the second feature matrix is different, and/or, the first image feature set is different from the initial image feature set in the second image feature set Image features are combined in different ways.
  • the pre-training module 1103 is also used for:
  • the model parameters of the target learning branch are updated.
  • the pre-training module 1103 is also used for:
  • model parameters of the online learning branch are updated through backpropagation.
  • the pre-training module 1103 is also used for:
  • the model parameters of the target learning branch are updated by EMA.
  • the fine-tuning module 1104 is also used for:
  • fine-tuning model parameters of the target learning branch in the image classification model based on the second sample image are fine-tuning model parameters of the target learning branch in the image classification model based on the second sample image.
  • the fine-tuning module 1104 is also used for:
  • the model parameters of the target learning branch are fine-tuned through back propagation.
  • the image classification model is a ViT model.
  • the initial image feature set is obtained by performing image segmentation and feature extraction on the sample image, and then the initial image features in the initial image feature set are rearranged and combined in different ways to obtain the first The image feature set and the second image feature set, and then the image classification model can be pre-trained based on the image feature sets in different rearrangement and combination methods, without the need for pre-training with marked sample images, reducing the need for marked sample images After pre-training, the pre-trained image classification model is fine-tuned through the labeled sample images to ensure the classification performance of the final image classification model, which helps to improve the accuracy of image classification.
  • the computer device 1200 includes a central processing unit (Central Processing Unit, CPU) 1201, a system memory 1204 including a random access memory 1202 and a read-only memory 1203, and a system connecting the system memory 1204 and the central processing unit 1201 Bus 1205.
  • the computer device 1200 also includes a basic input/output system (Input/Output, I/O system) 1206 that helps to transmit information between various devices in the computer, and is used to store an operating system 1213, an application program 1214 and other program modules 1215 of the mass storage device 1207.
  • I/O system Basic input/output system
  • the basic input/output system 1206 includes a display 1208 for displaying information and input devices 1209 such as a mouse and a keyboard for users to input information. Both the display 1208 and the input device 1209 are connected to the central processing unit 1201 through the input and output controller 121120 connected to the system bus 1205 .
  • the basic input/output system 1206 may also include an input-output controller 121120 for receiving and processing input from keyboards, mice, or electronic stylus and other devices. Similarly, input output controller 121120 also provides output to a display screen, printer, or other type of output device.
  • the mass storage device 1207 is connected to the central processing unit 1201 through a mass storage controller (not shown) connected to the system bus 1205 .
  • the mass storage device 1207 and its associated computer-readable media provide non-volatile storage for the computer device 1200 . That is, the mass storage device 1207 may include a computer-readable medium (not shown) such as a hard disk or drive.
  • Computer-readable media may comprise computer storage media and communication media.
  • Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
  • Computer storage media include random access memory (RAM, Random Access Memory), read-only memory (ROM, Read Only Memory), flash memory or other solid-state storage technologies, and compact disc (Compact Disc Read-Only Memory, CD-ROM) ), Digital Versatile Disc (DVD) or other optical storage, cassette, tape, magnetic disk storage or other magnetic storage device.
  • RAM Random Access Memory
  • ROM Read Only Memory
  • flash memory or other solid-state storage technologies
  • compact disc Compact Disc Read-Only Memory
  • DVD Digital Versatile Disc
  • the computer storage medium is not limited to the above-mentioned ones.
  • the above-mentioned system memory 1204 and mass storage device 1207 may be collectively referred to as memory.
  • the memory stores one or more programs, one or more programs are configured to be executed by one or more central processing units 1201, one or more programs include instructions for implementing the above method, and the central processing unit 1201 executes the one or more Multiple programs implement the methods provided by the above method embodiments.
  • the computer device 1200 can also run on a remote computer connected to the network through a network such as the Internet. That is, the computer device 1200 can be connected to the network 1212 through the network interface unit 1211 connected to the system bus 1205, or in other words, the network interface unit 1211 can also be used to connect to other types of networks or remote computer systems (not shown ).
  • the memory also includes one or more programs, the one or more programs are stored in the memory, and the one or more programs include the steps executed by the computer device in the method provided by the embodiment of the present application .
  • the embodiment of the present application also provides a computer-readable storage medium, at least one instruction, at least one program, code set or instruction set is stored in the readable storage medium, at least one instruction, at least one program, code set or instruction set is composed of
  • the processor loads and executes to implement the image classification method described in any one of the above embodiments.
  • An embodiment of the present application provides a computer program product or computer program, where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the image classification method provided by the above aspect.
  • the information including but not limited to user equipment information, user personal information, etc.
  • data including but not limited to data used for analysis, stored data, displayed data, etc.
  • signals involved in this application All are authorized by the user or fully authorized by all parties, and the collection, use and processing of relevant data need to comply with the relevant laws, regulations and standards of the relevant countries and regions.
  • information such as the first sample image and the second sample image mentioned in this application are obtained under the condition of sufficient authorization.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

本申请公开了一种图像分类方法、装置、设备、存储介质及程序产品。包括:对第一样本图像进行图像分割,以及对分割得到的各个图像块进行特征提取,得到初始图像特征集合(301);对初始图像特征集合中的初始图像特征进行重排组合,得到第一图像特征集合和第二图像特征集合(302);基于第一图像特征集合和第二图像特征集合预训练图像分类模型(303);基于第二样本图像对预训练后的图像分类模型进行微调(304)。上述方法、装置、设备、存储介质及程序产品有助于减少模型训练过程对标注样本图像的需求,并提高模型预测结果的准确性。

Description

图像分类方法、装置、设备、存储介质及程序产品
本申请要求于2021年06月29日提交,申请号为202110723873.5、发明名称为“图像分类方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请实施例中。
技术领域
本申请实施例涉及人工智能领域,特别涉及一种图像分类方法、装置、设备、存储介质及程序产品。
背景技术
图像分类是指根据图像的语义信息对不同类别图像进行区分的过程。
相关技术中,采用Vision Transformer(ViT)模型对输入图像进行分类,在对该模型训练过程中,输入大量经过标注的样本图像,进而基于模型预测的分类结果与标签间差异训练该模型,实现ViT模型对图像的精确分类。
然而,在训练过程中,若经过标注的样本图像较少,则ViT模型训练效果较差,影响图像分类准确性。
发明内容
本申请实施例提供了一种图像分类方法、装置、设备、存储介质及程序产品,可以减少图像分类模型训练过程中对经过标注的样本图像的需求,且有助于提高图像分类模型预测结果的准确性。所述技术方案包括如下内容。
一方面,本申请实施例提供了一种图像分类方法,所述方法由计算机设备执行,所述方法包括如下内容:
对第一样本图像进行图像分割,以及对分割得到的各个图像块进行特征提取,得到初始图像特征集合,所述初始图像特征集合中包含各个图像块对应的初始图像特征,所述第一样本图像是未经过标注的样本图像;
对所述初始图像特征集合中的所述初始图像特征进行重排组合,得到第一图像特征集合和第二图像特征集合,所述第一图像特征集合中的第一图像特征与所述第二图像特征集合中的第二图像特征对应不同重排组合方式;
基于所述第一图像特征集合和所述第二图像特征集合预训练图像分类模型,所述图像分类模型用于对图像中的内容进行分类;
基于第二样本图像对预训练后的所述图像分类模型进行微调,所述第二样本图像是经过标注的样本图像。
另一方面,本申请实施例提供了一种图像分类装置,所述装置包括如下模块:
图像分割模块,用于对第一样本图像进行图像分割,以及对分割得到的各个图像块进行特征提取,得到初始图像特征集合,所述初始图像特征集合中包含各个图像块对应的初始图像特征,所述第一样本图像是未经过标注的样本图像;
重排组合模块,用于对所述初始图像特征集合中的所述初始图像特征进行重排组合,得到第一图像特征集合和第二图像特征集合,所述第一图像特征集合中的第一图像特征与所述第二图像特征集合中的第二图像特征对应不同重排组合方式;
预训练模块,用于基于所述第一图像特征集合和所述第二图像特征集合预训练图像分类模型,所述图像分类模型用于对图像中的内容进行分类;
微调模块,用于基于第二样本图像对预训练后的所述图像分类模型进行微调,所述第二样本图像是经过标注的样本图像。
另一方面,本申请实施例提供了一种计算机设备,所述计算机设备包括处理器和存储器,所述存储器中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现如上述方面所述的图像分类方法。
另一方面,提供了一种计算机可读存储介质,所述可读存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由处理器加载并执行以实现如上述方面所述的图像分类方法。
另一方面,本申请实施例提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述方面提供的图像分类方法。
根据本申请的一个方面,提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述方面的各种可选实现方式中提供的图像分类方法。
本申请实施例中,以不同的方式打乱初始图像特征的顺序并重新排列组合,得到第一图像特征集合与第二图像特征集合,进而可基于不同重排组合方式下的图像特征集合对图像分类模型进行自监督预训练,无需借助已标注的样本图像进行预训练,减少对已标注样本图像的需求量,降低人工标注任务量,且在预训练后通过已标注的样本图像对预训练后的图像分类模型进行微调,确保最终得到的图像分类模型的分类性能,有助于提高图像分类的准确性。
附图说明
图1示出了本申请实施例提供的图像分类模型训练的原理示意图;
图2示出了本申请一个示例性实施例提供的实施环境的示意图;
图3示出了本申请一个示例性实施例提供的图像分类方法的流程图;
图4示出了本申请另一个示例性实施例提供的图像分类方法的流程图;
图5示出了本申请一个示例性实施例提供的ViT模型的结构示意图;
图6是一个示例性实施例示出的重排组合过程的实施示意图;
图7示出了本申请一个示例性实施例提供的图像分类模型预训练的实施示意图;
图8示出了本申请另一个示例性实施例提供的图像分类方法的流程图;
图9示出了本申请另一个示例性实施例提供的图像分类方法的流程图;
图10示出了本申请另一个示例性实施例提供的图像分类模型预训练的实施示意图;
图11是本申请一个示例性实施例提供的图像分类装置的结构框图;
图12示出了本申请一个示例性实施例提供的计算机设备的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个综合技术,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。
人工智能技术是一门综合学科,涉及领域广泛,既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。
计算机视觉技术(Computer Vision,CV)是一门研究如何使机器“看”的科学,更进一步的说,就是指用摄影机和电脑代替人眼对目标进行识别、跟踪和测量等机器视觉,并进一步做图形处理,使电脑处理成为更适合人眼观察或传送给仪器检测的图像。作为一个科学学科,计算机视觉研究相关的理论和技术,试图建立能够从图像或者多维数据中获取信息的人工智能系统。计算机视觉技术通常包括图像处理、图像识别、图像分割、图像语义理解、图像检索、视频处理、视频语义理解、视频内容/行为识别、三维物体重建、3D技术、虚拟现实、增强现实、同步定位与地图构建等技术,还包括常见的人脸识别、指纹识别等生物特征识别技术。本申请实施例涉及的图像分类方法,即计算机视觉技术在图像识别领域的应用,可减少图像分类模型训练过程中对已标注的样本图像的需求,且有助于提升训练后图像分类模型预测结果的准确性。
如图1所示,其示出了本申请实施例中对图像分类模型训练的原理示意图。其中,模型预训练系统中包含有第一重排组合模块102与第二重排组合模块103,将不携带样本标签的第一样本图像101分别输入至第一重排组合模块102以及第二重排组合模块103中,得到第一图像特征集合104以及第二图像特征集合105,进而基于第一图像特征集合104以及第二图像特征集合105预训练图像分类模型106。预训练完成后,将携带样本标签的第二样本图像107输入至预训练后的图像分类模型106,进行参数微调,得到最终图像分类模型进行图像分类。
图2示出了本申请一个示例性实施例提供的实施环境的示意图。该实施环境中包括计算机设备210和服务器220。其中,计算机设备210与服务器220之间通过通信网络进行数据通信,可选地,通信网络可以是有线网络也可以是无线网络,且该通信网络可以是局域网、城域网以及广域网中的至少一种。
计算机设备210是具有图像分类需求的电子设备,该电子设备可以是智能手机、平板电脑或个人计算机等等,本实施例对此不作限定。在一些实施例中,计算机设备210中运行有具有图像分类功能的应用程序。该应用程序可为社交类应用程序、图像检索类应用程序以及图片存储类应用程序。当需要对目标图像集合(如医学图像、动物图像、人物图像等)进行分类时,或识别单个目标图像的类别时,计算机设备210可将目标图像集合或目标图像输入应用程序,从而将目标图像集合或目标图像上传至服务器220,由服务器220进行图像类别的识别,并反馈分类结果。
服务器220可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(Content Delivery Network,CDN)、以及大数据和人工智能平台等基础云计算服务的云服务器。
在一些实施例中,服务器220用于为计算机设备210中安装的应用程序提供图像分类服务。可选的,服务器220中设置有图像分类模型,该图像分类模型是通过未经标注的第一样本图像预训练且经已标注的第二样本图像微调后的图像分类模型,用于对计算机设备210发送的图像进行分类。
当然,在其他可能的实施方式中,图像分类模型也可以部署在计算机设备210侧,由计算机设备210在本地实现图像分类,无需借助服务器220,相应的,图像分类模型在计算机设备210侧完成训练。或者,图像分类模型在服务器220侧完成训练,计算机设备210部署训练完成的图像分类模型。本实施例对此不作限定。为了方便表述,下述各个实施例以图像 分类方法由计算机设备执行为例进行说明。
可选的,部署有神经网络模型(图像分类模型)的服务器可以是一个分布式系统中的一个节点,其中,该分布式系统可以为区块链系统,该区块链系统可以是由该多个节点通过网络通信的形式连接形成的分布式系统。其中,节点之间可以组成点对点(Peer To Peer,P2P)网络,任意形式的计算设备,比如服务器、终端等电子设备都可以通过加入该点对点网络而成为该区块链系统中的一个节点。其中,节点包括硬件层、中间层、操作系统层和应用层。在模型训练过程中,可以将图像分类模型的训练样本保存在区块链上。
请参考图3,其示出了本申请一个示例性实施例提供的图像分类方法的流程图。本实施例以该方法由计算机设备执行为例进行说明,该方法包括如下步骤。
步骤301,对第一样本图像进行图像分割,以及对分割得到的各个图像块进行特征提取,得到初始图像特征集合,初始图像特征集合中包含各个图像块对应的初始图像特征,第一样本图像是未经过标注的样本图像。
本申请实施例中,图像分类模型可应用于任何识别图像内容所属类别的场景,因此,第一样本图像可为任意类别的图像,如医学图像、动物图像、风景图像等。且第一样本图像是未经过标注的样本图像的图像集合。
在一种可能的实施方式中,在获取第一样本图像后,计算机设备首先对该图像进行图像分割。可选的,可将第一样本图像分割为相同大小的图像块,且不同图像块中携带不同的图像信息。
分割完成后,计算机设备对分割得到的各个图像块进行特征提取。可选的,计算机设备对各个图像块进行线性映射,得到各个图像块对应的初始图像特征,组合初始图像特征集合。
步骤302,对初始图像特征集合中的初始图像特征进行重排组合,得到第一图像特征集合和第二图像特征集合,第一图像特征集合中的第一图像特征与第二图像特征集合中的第二图像特征对应不同重排组合方式。
在一种可能的实施方式中,得到初始图像特征集合后,计算机设备以不同的方式对初始图像特征集合中的各个初始图像特征进行重排组合,得到第一图像特征集合以及第二图像特征集合。
可选的,第一图像特征集合中所包含的各个第一图像特征与第二图像特征集合中所包含的各个第二图像特征所指示的图像信息不同。即计算机设备通过不同重排组合方式,得到关于第一样本图像中图像特征的不同组合方式。
且重排组合后,得到的第一图像特征的数目与初始图像特征集合中的初始图像特征的数目相同,相应的,第二图像特征的数目与初始图像特征的数目相同。
步骤303,基于第一图像特征集合和第二图像特征集合预训练图像分类模型,图像分类模型用于对图像中的内容进行分类。
预训练是指一种通过使用大型数据集对图像分类模型进行训练,使图像分类模型学习到数据集中的通用特征的过程。预训练的目的是为后续图像分类模型在特定数据集上训练提供优质的模型参数。
由于第一图像特征集合中的第一图像特征与第二图像特征集合中的第二图像特征是第一样本图像不同位置处的图像特征,虽然特征不同,但同属于同一图像的图像特征,二者对应的图像分类结果应当一致。因此,计算机设备将各个第一图像特征以及各个第二图像特征分别输入图像分类模型后,基于模型预测结果一致的原则,根据得到的分类结果可实现对图像分类模型的自监督训练,进而无需使用经过标注的样本图像,即带标签的样本图像。
图像分类模型用于对图像中内容进行分类,可选的,图像分类模型可识别单个图像的类别,也可对图像集合中各个图像进行类别区别,完成图像集合的分类。
步骤304,基于第二样本图像对预训练后的图像分类模型进行微调,第二样本图像是经 过标注的样本图像。
微调是通过少量数据集对模型参数进行小幅度精确调整的过程,且微调阶段采用监督式的学习方式,因此,采用经过标注的第二样本图像对预训练后的图像分类模型进行微调。
且由于预训练得到的模型已经具备优质的模型参数,微调阶段只需少量的标注样本即可使模型在目标任务上具备较高的性能,用于微调的数据集的数据量可以小于用于预训练的数据集的数据量。因此,第二样本图像的数量少于第一样本图像的数量,可减少对经过标注的样本图像的需求。
综上所述,本申请实施例中,以不同的方式打乱初始图像特征的顺序并重新排列组合,得到第一图像特征集合与第二图像特征集合,进而可基于不同重排组合方式下的图像特征集合对图像分类模型进行自监督预训练,无需借助已标注的样本图像进行预训练,减少对已标注样本图像的需求量,降低人工标注任务量,且在预训练后通过已标注的样本图像对预训练后的图像分类模型进行微调,确保最终得到的图像分类模型的分类性能,即有助于提高图像分类的准确性。
为实现自监督的预训练过程,本申请实施例中,采用在线学习分支以及目标学习分支分别对不同重排组合得到的图像特征集合进行图像分类,进而基于两个分支的分类结果实现对图像分类模型的预训练过程,下面,将以示例性实施例进行说明。
请参考图4,其示出了本申请另一个示例性实施例提供的图像分类方法的流程图。本实施例以该方法由计算机设备执行为例进行说明,该方法包括如下步骤。
步骤401,对第一样本图像进行图像分割,以及对分割得到的各个图像块进行特征提取,得到初始图像特征集合,初始图像特征集合中包含各个图像块对应的初始图像特征,第一样本图像是未经过标注的样本图像。
可选的,本申请实施例中的图像分类模型可为ViT模型,ViT模型是一种将CV与自然语言处理(Natural LanguageProcessing,NLP)领域结合起来得到的图像分类模型。
采用ViT模型对第一样本图像进行分类时,计算机设备首先将第一样本图像分割为固定大小的图像块,再通过线性变换将各个图像块变换为初始图像特征,即将每个图像块编码为一个token,且token带有顺序信息。
如图5所示,首先将第一样本图像分割为图像块501,再对各个图像块进行线性变换得到各个图像块对应的token 502,得到初始图像特征集合。
步骤402,调整初始图像特征集合中初始图像特征的特征顺序,得到第一初始图像特征集合和第二初始图像特征集合,第一初始图像特征集合和第二初始图像特征集合中初始图像特征的顺序不同。
在对初始图像特征进行重排组合时,首先调整初始图像特征的顺序,即打乱各个初始图像特征的位置信息。
可选的,计算机设备随机打乱初始图像特征的排列顺序,得到第一初始图像特征集合和第二初始图像特征集合。或者,计算机设备也可以按照两种固定的顺序调整方式改变初始图像特征的排列顺序,得到第一初始图像特征集合和第二初始图像特征集合。本申请实施例对此不作限定。
可选的,随机打乱时可调整每个初始图像特征的顺序,将其调整至与初始顺序不同,也可选取部分初始图像特征,仅调整部分初始图像特征的特征顺序。
在一种可能的实施方式中,由于第一图像特征集合与第二图像特征集合是通过对初始图像特征进行不同重排组合方式得到的图像特征集合,因此,在对初始图像特征进行随机打乱时,采用不同打乱方式,得到第一初始图像特征集合与第二初始图像特征集合,使其中的初始图像特征的顺序不同,即可使第一图像特征集合与第二图像特征集合不同。后续进行特征重组的方式可以相同,也可以不同。
示意性的,如图6所示,初始图像特征集合T∈{t 1,…,t 9}中包含9个初始图像特征,即9个token,其带有对应的顺序信息,首先将各个token进行随机打乱,得到第一初始图像特征集合T p1={t 3,t 5,t 8,t 1,t 6,t 2,t 9,t 4,t 7}。
且对各个token采用另一种随机打乱方式,得到第二初始图像特征集合,如T p2={t 2,t 7,t 3,t 1,t 4,t 9,t 8,t 5,t 6}。
步骤403,基于第一初始图像特征集合重排得到第一特征矩阵,以及基于第二初始图像特征集合重排得到第二特征矩阵。
在对初始图像特征进行打乱后,可基于打乱后的初始图像特征集合进行重排。在一种可能的实施方式中,可首先构建关于打乱后初始图像特征集合的特征矩阵,包括构建第一初始图像特征集合的第一特征矩阵以及第二初始图像特征集合的第二特征矩阵。其中,构建第一特征矩阵与第二特征矩阵可包括如下步骤。
步骤403a、基于第一样本图像的图像分割方式,确定矩阵尺寸。
可选的,构建矩阵时,可根据第一样本图像的图像分割方式确定构建矩阵的尺寸大小,避免构建矩阵的尺寸与分割得到的图像块数量不匹配。如若对第一样本图像分割得到9个图像块,则可构建3×3矩阵;若对第一样本图像分割得到16个图像块,则可构建4×4矩阵或2×8矩阵。
示意性的,如图5所示,对第一样本图像分割后,得到9个图像块,因此,在进行矩阵构建时,可确定矩阵尺寸为3×3大小,与分割得到的图像块数量相匹配。
步骤403b、基于矩阵尺寸,对第一初始图像特征集合中的初始图像特征进行重排,得到第一特征矩阵。
计算机设备确定矩阵尺寸后,根据矩阵尺寸大小构建第一特征矩阵,即将第一初始图像特征集合中的初始图像特征进行重排。
可选的,可根据第一初始图像特征中初始图像特征的顺序依次选择,按行排列,或按列排列,完成第一特征矩阵的构建。
结合上述示例,矩阵尺寸为3×3大小,则将第一初始图像特征集合T p1={t 3,t 5,t 8,t 1,t 6,t 2,t 9,t 4,t 7},依次选择按行排列,构建为3×3矩阵,即从t 3开始依次选择3个token作为矩阵第一行,顺序选择完成矩阵构建,如图6所示,第一特征矩阵为:
Figure PCTCN2022093376-appb-000001
步骤403c、基于矩阵尺寸,对第二初始图像特征集合中的初始图像特征进行重排,得到第二特征矩阵。
相应的,计算机设备同样根据矩阵尺寸构建第二特征矩阵,即将第二初始图像特征集合中的初始图像特征进行重排。可选的,构建第二特征矩阵的方式可与第一特征矩阵方式相同,也可不同,本实施例对此不做限定。如第一特征矩阵采用按行排列方式,第二特征矩阵采用按列排列方式;或,第一特征矩阵采用按列排列方式,第二特征矩阵采用按行排列方式;或,第一特征矩阵与第二特征矩阵均采用按行排列方式等。
结合上述示例,将第二初始图像特征集合T p2={t 2,t 7,t 3,t 1,t 4,t 9,t 8,t 5,t 6}按列构建3×3矩阵,得到第二特征矩阵如下:
Figure PCTCN2022093376-appb-000002
需要说明的是,本步骤与上述步骤403b,即构建第一特征矩阵的步骤可同步执行也可异步执行,本实施例仅对第一特征矩阵以及第二特征矩阵的构建方式进行说明,但对执行时序不做限定。
步骤404,对第一特征矩阵中的初始图像特征进行特征组合,基于特征组合结果生成第 一图像特征集合。
重排结束后,计算机设备对第一特征矩阵中的初始图像特征进行特征组合,并根据组合结果生成第一图像特征集合。其中,第一图像特征集合中各个第一图像特征所对应的图像块中图像信息发生改变,即与各个初始图像特征对应图像块中的图像信息不同。
在一种可能的实施方式中,特征组合并基于组合结果生成第一图像特征集合的过程可包括如下步骤。
步骤404a、通过滑窗选取第一特征矩阵中相邻的n个初始图像特征。
可选的,计算机设备通过滑窗采样的方式,每次选取n个初始图像特征进行特征组合,其中,滑窗大小需小于矩阵尺寸。如,对于3×3矩阵可采用2×2滑窗,对于4×4矩阵可采用2×2滑窗或3×3滑窗。
示意性的,可采用2×2的滑窗对3×3的第一特征矩阵进行采样,如图6所示,通过滑窗601可选取4个初始图像特征。
步骤404b、对n个初始图像特征进行特征组合,得到第一组合图像特征。
可选的,对滑窗内的n个初始图像特征进行特征组合,得到组合后的第一组合图像特征。
在一种可能的实施方式中,特征组合方式可包括特征拼接、特征融合等,即对n个初始图像特征进行特征拼接,得到第一组合图像特征,或,对n个初始图像特征进行特征融合即特征相加,得到第一组合图像特征。
示意性的,如图6所示,计算机设备对4个初始图像特征t 3,t 5,t 1,t 6进行特征拼接,得到第一组合图像特征602。
步骤404c、对m组第一组合图像特征进行线性映射,得到第一图像特征集合,m组第一组合图像特征通过移动滑窗得到。
可选的,计算机设备通过滑动滑窗遍历第一特征矩阵,即可得到m组第一组合图像特征,m为正整数。其中,滑窗的滑动步长以及滑动方向可随机设置,也可固定设置。如,可设置滑动步长为1,并根据行的方向滑动。
得到m组第一组合图像特征后,计算机设备对m组第一组合图像特征进行线性映射,得到第一图像特征集合。可选的,可将m组第一组合图像特征输出至一个多层感知机(Multilayer Perceptron,MLP)中,进行线性映射,得到第一图像特征集合。映射得到的第一图像特征集合中第一图像特征数目与初始图像特征数目相同。
如图6所示,设置滑动步长为1,并首先向行方向滑动,一行图像特征组合结束后向列方向移动滑窗,对下一行图像特征进行组合,即可得到4组初始图像特征的组合,每组中包含4个初始图像特征。计算机设备分别对每组中包含的4个初始图像特征进行特征组合,得到4组第一组合图像特征即T L={t 1',t 2',t 3',t 4'},其中t 1'即为t 3,t 5,t 1,t 6拼接得到的第一组合图像特征,t 2'即为t 5,t 8,t 6,t 2拼接得到的第一组合图像特征,t 3'即为t 1,t 6,t 9,t 4拼接得到的第一组合图像特征,t 4'即为t 6,t 2,t 4,t 7拼接得到的第一组合图像特征。将T L={t 1',t 2',t 3',t 4'}进行线性映射,得到第一图像特征集合
Figure PCTCN2022093376-appb-000003
步骤405,对第二特征矩阵中的初始图像特征进行特征组合,基于特征组合结果生成第二图像特征集合。
相应的,重排结束后,计算机设备对第二特征矩阵中的初始图像特征进行特征组合,并根据组合结果生成第二图像特征集合。其中,第二图像特征集合中各个第二图像特征所对应的图像块中图像信息发生改变,即与各个初始图像特征对应图像块中的图像信息不同。且与第一图像特征所对应的图像块的图像信息不同。
在一种可能的实施方式中,特征组合并基于组合结果生成第二图像特征集合可包括如下步骤。
步骤405a、通过滑窗选取第二特征矩阵中相邻的n个初始图像特征。
可选的,构建第二特征矩阵时,计算机设备同样通过滑窗采样的方式,选取n个初始图 像特征进行特征组合,其中,滑窗大小需小于矩阵尺寸。
且对第二特征矩阵采样的滑窗大小可与对第一特征矩阵采样的滑窗大小相同,也可不同。例如,对于一个4×4的第一特征矩阵,可以采用2×2的滑窗进行采样,得到4组互不存在交集的第一组合图像特征,对于一个4×4的第二特征矩阵,乐意采用3×3的滑窗进行采样,得到4组互相存在交集的第二组合图像特征。
步骤405b、对n个初始图像特征进行特征组合,得到第二组合图像特征。
可选的,对滑窗内的n个初始图像特征进行特征组合,得到组合后的第二组合图像特征。
在一种可能的实施方式中,特征组合方式可包括特征拼接、特征融合等,即对n个初始图像特征进行特征拼接,得到第二组合图像特征,或,对n个初始图像特征进行特征融合即特征相加,得到第二组合图像特征。
步骤405c、对m组第二组合图像特征进行线性映射,得到第二图像特征集合,m组第二组合图像特征通过移动滑窗得到。
同样的,计算机设备通过滑动滑窗遍历第二特征矩阵,即可得到m组第二组合图像特征。其中,滑窗的滑动步长以及滑动方向可随机设置。如,可设置滑动步长为1,并根据列的方向滑动。
得到m组第二组合图像特征后,即对m组第二组合图像特征进行线性映射,得到第二图像特征集合。可选的,可将m组第二组合图像特征输出至一个MLP中,进行线性映射,得到第二图像特征集合。映射得到的第二图像特征集合中第二图像特征数目与初始图像特征数目相同。
步骤406,将第一图像特征集合输入图像分类模型的在线学习分支,得到第一分类结果。
可选的,得到第一图像特征集合以及第二图像特征集合后,计算机设备即可利用第一图像特征集合以及第二图像特征集合对图像分类模型进行预训练。
可选的,图像分类模型包括在线学习分支以及目标学习分支,其中,在线学习分支与目标学习分支中的图像分类模型的结构相同,其均为ViT模型对应的结构,但其对应的模型参数的更新方式不同。
在一种可能的实施方式中,计算机设备将第一图像特征集合输入至图像分类模型的在线学习分支中,在线学习分支用于根据第一图像特征集合所指示的图像特征识别第一样本图像的图像类别,得到第一分类结果。其ViT模型如图5所示,将第一图像特征集合输入至Transformer编码器中,对第一图像特征集合进行图像特征提取,并将提取结果输入至分类器MLP Head中进行图像分类,得到第一分类结果。
示意性的,如图7所示,将第一样本图像701输入第一重排组合模块702中,得到第一图像特征集合,并将第一图像特征集合输入ViT模型,得到第一分类结果Z,该分支即为在线学习分支。
步骤407,将第二图像特征集合输入图像分类模型的目标学习分支,得到第二分类结果。
可选的,将第二图像特征集合输入至目标学习分支中,目标学习分支用于根据第二图像特征集合所指示的图像特征识别第二样本图像的图像类别,即得到第二分类结果。与得到第一分类结果的方式相同,将第二图像特征集合输入至编码器中,对第二图像特征集合进行图像特征的提取,将提取结果同样输入至分类器MLP Head中进行图像分类,得到第二分类结果。
示意性的,如图7所示,将第一样本图像701输入至第二重排组合模块703中,得到第二图像特征集合,并将第二图像特征集合输入ViT模型,得到第二分类结果Z',该分支即为目标学习分支。其中,重排组合模块703与重排组合模块702分别对应不同重排组合方式。
步骤408,基于第一分类结果与第二分类结果训练在线学习分支。
由于第一图像特征集合中的第一图像特征与第二图像特征集合中的第二图像特征各不相同,因此,为使图像分类模型可对同一第一样本图像特征在不同组合方式下进行准确的图像 分类,本实施例中,计算机设备基于第一分类结果与第二分类结果首先训练在线学习分支,再基于更新后的在线学习分支更新目标学习分支的模型参数。在线学习分支的训练过程可包括如下步骤。
步骤408a、确定第一分类结果与第二分类结果的相似度损失。
为使图像分类模型能对不同组合排列方式下的图像特征识别结果一致,计算机设备确定第一分类结果与第二分类结果间的相似度损失,进而基于该相似度损失训练ViT模型,使其能根据不同组合排列方式下的图像特征得到相同分类结果,进而提高ViT模型进行图像分类的准确性。无需使用已标注的样本图像也可实现对ViT模型的模型参数的更新,实现ViT模型自监督学习。
可选的,相似度损失即表示第一分类结果与第二分类结果间的一致程度,可采用L1损失函数,也可采用L2损失函数等,确定第一分类结果与第二分类结果间的相似度损失。如,相似度损失可为:
Figure PCTCN2022093376-appb-000004
其中,L表示相似度损失,Z表示第一分类结果,Z'表示第二分类结果。
步骤408b、基于相似度损失,通过反向传播更新在线学习分支的模型参数。
本实施例中,在线学习分支中模型参数的更新方式与目标学习分支中模型参数的更新方式不同。其中,在线学习分支采用反向传播方式更新,而目标学习分支中模型参数根据在线学习分支中模型参数更新。进而通过迭代训练,优化在线学习分支以及目标学习分支中的图像分类模型的模型参数,即优化ViT模型的模型参数。
在一种可能的实施方式中,确定相似度损失后,可基于相似度损失反向传播更新在线学习分支的模型参数,直至模型参数满足训练条件为止,即相似度损失达到收敛条件为止。
步骤409,基于训练后在线学习分支的模型参数,更新目标学习分支的模型参数。
可选的,每次更新在线学习分支的模型参数后,计算机设备将随之更新目标学习分支的模型参数。最终,当在线学习分支中的模型参数满足训练条件后,将再次更新目标学习分支的模型参数,此时,在线学习分支以及目标学习分支均停止模型参数的更新。
可选的,可基于训练后在线学习分支的模型参数,对目标学习分支的模型参数进行指数滑动平均(Exponential Moving Average,EMA)更新,更新方式如下所示:
ζ=τζ+(1-τ)θ
其中,ξ为目标学习分支中图像分类模型的模型参数,θ为在线学习分支中图像分类模型的模型参数,τ为平衡两个模型参数的权重参数。
步骤410,基于第二样本图像对图像分类模型中目标学习分支的模型参数进行微调。
为进一步提高图像分类模型识别图像类别的准确性,在基于未标注的样本图像对图像分类模型预训练后,将采用少量经过标注的第二样本图像对图像分类模型的模型参数进行微调。
可选的,可对目标学习分支的模型参数进行微调,微调过程可包括如下步骤。
步骤410a、将第二样本图像输入图像分类模型的目标学习分支,得到样本分类结果。
将各个已经标注的第二样本图像输入目标学习分支的ViT模型中,得到各个第二样本图像对应的样本分类结果。
步骤410b、基于样本分类结果以及第二样本图像对应的样本分类标注,通过反向传播微调目标学习分支的模型参数。
确定样本分类结果后,可根据样本分类结果与预先已标注的样本分类标注,通过反向传播方式微调模型参数,得到最终图像分类模型。如,可基于样本分类结果与已标注的样本分类标注确定损失,基于损失反向微调模型参数,得到优化后的模型参数。
最后,基于使用优化后模型参数的ViT模型进行图像分类。
本实施例中,通过对不同重排组合方式下的第一图像特征集合以及第二图像特征集合进行图像分类,基于得到的第一分类结果与第二分类结果对图像分类模型进行预训练,可提高 图像分类模型对同一样本图像特征的不同组合方式进行分类预测时,输出分类结果的准确性。
上述实施例通过调整得到不同的特征顺序,使第一图像特征集合和第二图像特征集合中图像特征的重排组合方式不同。在另一种可能的实施方式中,计算机设备还可以通过设置不同的特征重排方式或特征组合方式,得到第一图像特征集合和第二图像特征集合。
请参考图8,其示出了本申请另一个实施例提供的图像分类方法的流程图。本实施例以该方法由计算机设备执行为例进行说明,该方法包括如下步骤。
步骤801,对第一样本图像进行图像分割,以及对分割得到的各个图像块进行特征提取,得到初始图像特征集合,初始图像特征集合中包含各个图像块对应的初始图像特征,第一样本图像是未经过标注的样本图像。
步骤801的具体实施方式可以参考上述步骤401,本申请实施例在此不再赘述。
步骤802,调整初始图像特征集合中初始图像特征的特征顺序,得到第一初始图像特征集合和第二初始图像特征集合。
在对初始图像特征进行重排组合时,首先调整初始图像特征的顺序,即打乱各个初始图像特征的位置信息。
可选的,本申请实施例中,计算机设备可以按照相同的方式调整初始图像特征集合中初始图像特征的特征顺序,得到第一初始图像特征集合和第二初始图像特征集合,也可以按照不同的打乱方式调整初始图像特征集合中初始图像特征的特征顺序,得到第一初始图像特征集合和第二初始图像特征集合。本申请实施例对此不作限定。
步骤803,基于第一初始图像特征集合重排得到第一特征矩阵,以及基于第二初始图像特征集合重排得到第二特征矩阵。
可选的,计算机设备可以按照相同的重排方式对第一初始图像特征集合和第二初始图像特征集合进行重排,也可以按照不同的重排方式对第一初始图像特征集合和第二初始图像特征集合进行重排。当计算机设备按照相同的重排方式对第一初始图像特征集合和第二初始图像特征集合进行重排时,则后续的特征组合过程需保证第一图像特征集合与第二图像特征集合对应的特征组合方式不同。
具体的特征集合重排过程可参考上述步骤403,本申请实施例在此不再赘述。
步骤804,对第一特征矩阵中的初始图像特征进行特征组合,基于特征组合结果生成第一图像特征集合。
步骤805,对第二特征矩阵中的初始图像特征进行特征组合,基于特征组合结果生成第二图像特征集合;第一特征矩阵与第二特征矩阵中初始图像特征的重排方式不同,和/或,第一图像特征集合与第二图像特征集合中初始图像特征的特征组合方式不同。
重排结束后,计算机设备对第一特征矩阵中的初始图像特征进行特征组合,根据组合结果生成第一图像特征集合,并对第二特征矩阵中的初始图像特征进行特征组合,根据组合结果生成第二图像特征集合。
可选的,本申请实施例中,若计算机设备按照相同的重排方式对第一初始图像特征集合和第二初始图像特征集合进行重排,则第一图像特征集合与第二图像特征集合中初始图像特征的特征组合方式需不同,从而保证第一图像特征集合和第二图像特征集合对应的图像特征不同。
步骤806,基于第一图像特征集合和第二图像特征集合预训练图像分类模型,图像分类模型用于对图像中的内容进行分类。
步骤807,基于第二样本图像对预训练后的图像分类模型进行微调,第二样本图像是经过标注的样本图像。
本申请实施例针对第一图像特征集合和第二图像特征集合的另一种生成方式进行了说明,具体的模型预训练以及微调过程可参考图4对应的实施例,本申请实施例在此不再赘述。
在一种可能的应用场景中,若需进一步提升图像分类模型的鲁棒性与准确性,可通过多次重排组合提升重排组合后得到的图像特征集合中图像特征的复杂性,进而通过复杂的图像特征集合预训练图像分类模型,下面将以示例性实施例进行说明。
请参考图9,其示出了本申请另一个示例性实施例提供的图像分类方法的流程图。本实施例以该方法用于计算机设备为例进行说明,该方法包括如下步骤。
步骤901,对第一样本图像进行图像分割,并对分割得到的各个图像块进行特征提取,得到初始图像特征集合。
步骤902,对初始图像特征集合中的初始图像特征进行重排组合,得到第一图像特征集合和第二图像特征集合。
步骤901至步骤902的实施方式可参考上述步骤401至步骤405,本实施例不再赘述。
步骤903,基于第一图像特征集合,迭代进行至少一次重排组合,得到第三图像特征集合。
可选的,为进一步提升图像分类模型的鲁棒性与准确性,在得到第一图像特征集合后,继续对第一图像特征集合中的第一图像特征进行重排组合,得到新的图像特征集合,并继续对新的图像特征集合中的图像特征进行重排组合,即迭代进行至少一次重排组合,迭代重排组合后,得到第三图像特征集合。
可选的,迭代次数可根据对图像分类模型的分类性能需求设置,迭代次数与图像分类模型的分类性能呈正相关关系。
其中,迭代进行重排组合的方式可参考上述对初始图像特征集合中初始图像特征进行重排组合的方式,即,包括对第一图像特征的打乱、重排、组合以及最终进行线性映射的过程。且迭代进行重排组合过程中,可采用相同的重排组合方式,也可采用不同的重排组合方式,本实施例对此不做限定。
步骤904,基于第二图像特征集合,迭代进行至少一次重排组合,得到第四图像特征集合。
对第一图像特征集合迭代进行至少一次重排组合时,也可对第二图像特征集合迭代进行至少一次重排组合,得到第四图像特征集合。同样的,重排组合方式包括对第二图像特征的打乱、重排、组合以及最终进行线性映射的过程。且,同样可采用相同的重排组合方式或不同的重排组合方式。
可选的,对第二图像特征集合进行迭代重排组合的迭代次数与可对第一图像特征集合进行迭代重排组合的迭代次数相同,也可不同。且在另一种可能的实施方式中,也可仅基于第一图像特征集合迭代进行至少一次重排组合或者仅基于第二图像特征集合迭代进行至少一次重排组合。
步骤905,基于第三图像特征集合和第四图像特征集合预训练图像分类模型。
可选的,基于第三图像特征集合和第四图像特征集合预训练图像分类模型的步骤可参考上述实施例中基于第一图像特征集合和第二图像特征集合预训练图像分类模型的步骤,本实施例不再赘述。
步骤906,基于第二样本图像对预训练后的图像分类模型进行微调,第二样本图像是经过标注的样本图像。
本步骤实施方式可参考上述步骤410,本实施例不再赘述。
本实施例中,在对初始图像特征进行重排组合得到第一图像特征集合以及第二图像特征集合后,继续基于第一图像特征集合与第二图像特征集合迭代进行重排组合,提升最终得到的第三图像特征集合中第三图像特征以及第四图像特征集合中第四图像特征的复杂性,进而基于第三图像特征集合与第四图像特征集合预训练图像分类模型,提高图像分类模型的鲁棒性。
上述实施例中,通过对第一图像特征集合以及第二图像特征集合迭代重排组合,进而提升图像分类模型的鲁棒性。在另一种可能的实施方式中,可继续添加图像分类模型的学习分支,从而基于多分支的分类结果预训练图像分类模型。可选的,可基于两两分类结果间的相似度损失反向传播更新在线学习分支的模型参数。
如图10所示,分别将第一样本图像1001输入至第一重排组合模块1002、第二重排组合模块1003以及第三重排组合模块1004中,得到不同重排组合方式下的图像特征集合,并分别将图像特征集合输入至ViT模型中,进行图像分类,得到第一分类结果Z,第二分类结果Z'以及第三分类结果Z”,进而可基于第一分类结果Z以及第二分类结果Z'确定第一相似度损失L1,基于第一分类结果Z以及第三分类结果Z”确定第二相似度损失L2,以及基于第二分类结果Z'以及第三分类结果Z”确定第三相似度损失L3,进而基于第一相似度损失L1、第二相似度损失L2以及第三相似度损失L3确定总损失,反向传播更新ViT模型1005的模型参数,而ViT模型1006以及ViT模型1007的模型参数基于ViT模型1005的模型参数更新。
通过对多种重排组合方式下得到的图像特征集合分别进行图像分类,并基于多个分类结果训练图像分类模型,有助于提高图像分类模型的鲁棒性。
图11是本申请一个示例性实施例提供的图像分类装置的结构框图,如图所示,该装置包括如下模块:
图像分割模块1101,用于对第一样本图像进行图像分割,并对分割得到的各个图像块进行特征提取,得到初始图像特征集合,所述初始图像特征集合中包含各个图像块对应的初始图像特征,所述第一样本图像是未经过标注的样本图像;
重排组合模块1102,用于对所述初始图像特征集合中的所述初始图像特征进行重排组合,得到第一图像特征集合和第二图像特征集合,所述第一图像特征集合中的第一图像特征与所述第二图像特征集合中的第二图像特征对应不同重排组合方式;
预训练模块1103,用于基于所述第一图像特征集合和所述第二图像特征集合预训练图像分类模型,所述图像分类模型用于对图像中的内容进行分类;
微调模块1104,用于基于第二样本图像对预训练后的所述图像分类模型进行微调,所述第二样本图像是经过标注的样本图像。
可选的,所述第一重排组合模块1102,还用于:
调整所述初始图像特征集合中所述初始图像特征的特征顺序,得到第一初始图像特征集合和第二初始图像特征集合,所述第一初始图像特征集合和所述第二初始图像特征集合中所述初始图像特征的顺序不同;
基于所述第一初始图像特征集合重排得到第一特征矩阵,并基于所述第二初始图像特征集合重排得到第二特征矩阵;
对所述第一特征矩阵中的所述初始图像特征进行特征组合,并基于特征组合结果生成所述第一图像特征集合;
对所述第二特征矩阵中的所述初始图像特征进行特征组合,并基于特征组合结果生成所述第二图像特征集合。
可选的,所述重排组合模块1102,还用于:
通过滑窗选取所述第一特征矩阵中相邻的n个初始图像特征;
对所述n个初始图像特征进行特征组合,得到第一组合图像特征;
对m组所述第一组合图像特征进行线性映射,得到所述第一图像特征集合,m组所述第一组合图像特征通过移动所述滑窗得到;
可选的,所述重排组合模块1102,还用于:
通过滑窗选取所述第二特征矩阵中相邻的n个初始图像特征;
对所述n个初始图像特征进行特征组合,得到第二组合图像特征;
对m组所述第二组合图像特征进行线性映射,得到所述第二图像特征集合,m组所述第二组合图像特征通过移动所述滑窗得到。
可选的,所述重排组合模块1102,还用于:
对所述n个初始图像特征进行特征拼接,得到所述第一组合图像特征,或,对所述n个初始图像特征进行特征融合,得到所述第一组合图像特征;
可选的,所述重排组合模块1102,还用于:
对所述n个初始图像特征进行特征拼接,得到所述第二组合图像特征,或,对所述n个初始图像特征进行特征融合,得到所述第二组合图像特征。
可选的,所述重排组合模块1102,还用于:
基于所述第一样本图像的图像分割方式,确定矩阵尺寸;
基于所述矩阵尺寸,对所述第一初始图像特征集合中的初始图像特征进行重排,得到所述第一特征矩阵;
基于所述矩阵尺寸,对所述第二初始图像特征集合中的初始图像特征进行重排,得到所述第二特征矩阵。
可选的,所述重排组合模块1102还用于:
基于所述第一图像特征集合,迭代进行至少一次重排组合,得到第三图像特征集合;
基于所述第二图像特征集合,迭代进行至少一次重排组合,得到第四图像特征集合;
所述预训练模块1103,还用于基于所述第三图像特征集合和所述第四图像特征集合预训练图像分类模型。
可选的,所述重排组合模块1102,还用于:
调整所述初始图像特征集合中所述初始图像特征的特征顺序,得到第一初始图像特征集合和第二初始图像特征集合;
基于所述第一初始图像特征集合重排得到第一特征矩阵,以及基于所述第二初始图像特征集合重排得到第二特征矩阵;
对所述第一特征矩阵中的所述初始图像特征进行特征组合,基于特征组合结果生成所述第一图像特征集合;
对所述第二特征矩阵中的所述初始图像特征进行特征组合,基于特征组合结果生成所述第二图像特征集合;
其中,所述第一特征矩阵与所述第二特征矩阵中所述初始图像特征的重排方式不同,和/或,所述第一图像特征集合与所述第二图像特征集合中所述初始图像特征的特征组合方式不同。
可选的,所述预训练模块1103,还用于:
将所述第一图像特征集合输入所述图像分类模型的在线学习分支,得到第一分类结果;
将所述第二图像特征集合输入所述图像分类模型的目标学习分支,得到第二分类结果;
基于所述第一分类结果与所述第二分类结果训练所述在线学习分支;
基于训练后所述在线学习分支的模型参数,更新所述目标学习分支的模型参数。
可选的,所述预训练模块1103,还用于:
确定所述第一分类结果与所述第二分类结果的相似度损失;
基于所述相似度损失,通过反向传播更新所述在线学习分支的模型参数。
可选的,所述预训练模块1103,还用于:
基于训练后所述在线学习分支的模型参数,对所述目标学习分支的模型参数进行EMA更新。
可选的,所述微调模块1104,还用于:
基于所述第二样本图像对所述图像分类模型中所述目标学习分支的模型参数进行微调。
可选的,所述微调模块1104,还用于:
将所述第二样本图像输入所述图像分类模型的所述目标学习分支,得到样本分类结果;
基于所述样本分类结果以及所述第二样本图像对应的样本分类标注,通过反向传播微调所述目标学习分支的模型参数。
可选的,所述图像分类模型为ViT模型。
综上所述,本申请实施例中,通过对样本图像进行图像分割以及特征提取,得到初始图像特征集合,再对初始图像特征集合中的初始图像特征进行不同方式的重排组合,得到第一图像特征集合与第二图像特征集合,进而可基于不同重排组合方式下的图像特征集合对图像分类模型进行预训练,无需借助已标注的样本图像进行预训练,减少对已标注样本图像的需求量,且在预训练后通过已标注的样本图像对预训练后的图像分类模型进行微调,确保最终得到的图像分类模型的分类性能,即有助于提高图像分类的准确性。
请参考图11,其示出了本申请一个示例性实施例提供的计算机设备的结构示意图。具体来讲:所述计算机设备1200包括中央处理单元(Central Processing Unit,CPU)1201、包括随机存取存储器1202和只读存储器1203的系统存储器1204,以及连接系统存储器1204和中央处理单元1201的系统总线1205。所述计算机设备1200还包括帮助计算机内的各个器件之间传输信息的基本输入/输出系统(Input/Output,I/O系统)1206,和用于存储操作系统1213、应用程序1214和其他程序模块1215的大容量存储设备1207。
所述基本输入/输出系统1206包括有用于显示信息的显示器1208和用于用户输入信息的诸如鼠标、键盘之类的输入设备1209。其中所述显示器1208和输入设备1209都通过连接到系统总线1205的输入输出控制器121120连接到中央处理单元1201。所述基本输入/输出系统1206还可以包括输入输出控制器121120以用于接收和处理来自键盘、鼠标、或电子触控笔等多个其他设备的输入。类似地,输入输出控制器121120还提供输出到显示屏、打印机或其他类型的输出设备。
所述大容量存储设备1207通过连接到系统总线1205的大容量存储控制器(未示出)连接到中央处理单元1201。所述大容量存储设备1207及其相关联的计算机可读介质为计算机设备1200提供非易失性存储。也就是说,所述大容量存储设备1207可以包括诸如硬盘或者驱动器之类的计算机可读介质(未示出)。
不失一般性,所述计算机可读介质可以包括计算机存储介质和通信介质。计算机存储介质包括以用于存储诸如计算机可读指令、数据结构、程序模块或其他数据等信息的任何方法或技术实现的易失性和非易失性、可移动和不可移动介质。计算机存储介质包括随机存取记忆体(RAM,Random Access Memory)、只读存储器(ROM,Read Only Memory)、闪存或其他固态存储其技术,只读光盘(Compact Disc Read-Only Memory,CD-ROM)、数字通用光盘(Digital Versatile Disc,DVD)或其他光学存储、磁带盒、磁带、磁盘存储或其他磁性存储设备。当然,本领域技术人员可知所述计算机存储介质不局限于上述几种。上述的系统存储器1204和大容量存储设备1207可以统称为存储器。
存储器存储有一个或多个程序,一个或多个程序被配置成由一个或多个中央处理单元1201执行,一个或多个程序包含用于实现上述方法的指令,中央处理单元1201执行该一个或多个程序实现上述各个方法实施例提供的方法。
根据本申请的各种实施例,所述计算机设备1200还可以通过诸如因特网等网络连接到网络上的远程计算机运行。也即计算机设备1200可以通过连接在所述系统总线1205上的网络接口单元1211接到网络1212,或者说,也可以使用网络接口单元1211来连接到其他类型的网络或远程计算机系统(未示出)。
所述存储器还包括一个或者一个以上的程序,所述一个或者一个以上程序存储于存储器中,所述一个或者一个以上程序包含用于进行本申请实施例提供的方法中由计算机设备所执行的步骤。
本申请实施例还提供一种计算机可读存储介质,该可读存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,至少一条指令、至少一段程序、代码集或指令集由处理器加载并执行以实现上述任一实施例所述的图像分类方法。
本申请实施例提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述方面提供的图像分类方法。
需要说明的是,本申请所涉及的信息(包括但不限于用户设备信息、用户个人信息等)、数据(包括但不限于用于分析的数据、存储的数据、展示的数据等)以及信号,均为经用户授权或者经过各方充分授权的,且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准。例如,本申请中涉及到的第一样本图像、第二样本图像等信息都是在充分授权的情况下获取的。
以上所述仅为本申请的可选的实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (20)

  1. 一种图像分类方法,所述方法由计算机设备执行,所述方法包括:
    对第一样本图像进行图像分割,以及对分割得到的各个图像块进行特征提取,得到初始图像特征集合,所述初始图像特征集合中包含各个图像块对应的初始图像特征,所述第一样本图像是未经过标注的样本图像;
    对所述初始图像特征集合中的所述初始图像特征进行重排组合,得到第一图像特征集合和第二图像特征集合,所述第一图像特征集合中的第一图像特征与所述第二图像特征集合中的第二图像特征对应不同重排组合方式;
    基于所述第一图像特征集合和所述第二图像特征集合预训练图像分类模型,所述图像分类模型用于对图像中的内容进行分类;
    基于第二样本图像对预训练后的所述图像分类模型进行微调,所述第二样本图像是经过标注的样本图像。
  2. 根据权利要求1所述的方法,其中,所述对所述初始图像特征集合中的所述初始图像特征进行重排组合,得到第一图像特征集合和第二图像特征集合,包括:
    调整所述初始图像特征集合中所述初始图像特征的特征顺序,得到第一初始图像特征集合和第二初始图像特征集合,所述第一初始图像特征集合和所述第二初始图像特征集合中所述初始图像特征的顺序不同;
    基于所述第一初始图像特征集合重排得到第一特征矩阵,以及基于所述第二初始图像特征集合重排得到第二特征矩阵;
    对所述第一特征矩阵中的所述初始图像特征进行特征组合,基于特征组合结果生成所述第一图像特征集合;
    对所述第二特征矩阵中的所述初始图像特征进行特征组合,基于特征组合结果生成所述第二图像特征集合。
  3. 根据权利要求2所述的方法,其中,所述对所述第一特征矩阵中的所述初始图像特征进行特征组合,基于特征组合结果生成所述第一图像特征集合,包括:
    通过滑窗选取所述第一特征矩阵中相邻的n个初始图像特征;
    对所述n个初始图像特征进行特征组合,得到第一组合图像特征;
    对m组所述第一组合图像特征进行线性映射,得到所述第一图像特征集合,m组所述第一组合图像特征通过移动所述滑窗得到;
    所述对所述第二特征矩阵中的所述初始图像特征进行特征组合,基于特征组合结果生成所述第二图像特征集合,包括:
    通过滑窗选取所述第二特征矩阵中相邻的n个初始图像特征;
    对所述n个初始图像特征进行特征组合,得到第二组合图像特征;
    对m组所述第二组合图像特征进行线性映射,得到所述第二图像特征集合,m组所述第二组合图像特征通过移动所述滑窗得到。
  4. 根据权利要求3所述的方法,其中,所述对所述n个初始图像特征进行特征组合,得到第一组合图像特征,包括:
    对所述n个初始图像特征进行特征拼接,得到所述第一组合图像特征,或,对所述n个初始图像特征进行特征融合,得到所述第一组合图像特征;
    所述对所述n个初始图像特征进行特征组合,得到第二组合图像特征,包括:
    对所述n个初始图像特征进行特征拼接,得到所述第二组合图像特征,或,对所述n个 初始图像特征进行特征融合,得到所述第二组合图像特征。
  5. 根据权利要求2所述的方法,其中,所述基于所述第一初始图像特征集合重排得到第一特征矩阵,以及基于所述第二初始图像特征集合重排得到第二特征矩阵,包括:
    基于所述第一样本图像的图像分割方式,确定矩阵尺寸;
    基于所述矩阵尺寸,对所述第一初始图像特征集合中的初始图像特征进行重排,得到所述第一特征矩阵;
    基于所述矩阵尺寸,对所述第二初始图像特征集合中的初始图像特征进行重排,得到所述第二特征矩阵。
  6. 根据权利要求1所述的方法,其中,所述对所述初始图像特征集合中的所述初始图像特征进行重排组合,得到第一图像特征集合和第二图像特征集合之后,所述方法包括:
    基于所述第一图像特征集合,迭代进行至少一次重排组合,得到第三图像特征集合;
    基于所述第二图像特征集合,迭代进行至少一次重排组合,得到第四图像特征集合;
    基于所述第三图像特征集合和所述第四图像特征集合预训练所述图像分类模型。
  7. 根据权利要求1所述的方法,其中,所述对所述初始图像特征集合中的所述初始图像特征进行重排组合,得到第一图像特征集合和第二图像特征集合,包括:
    调整所述初始图像特征集合中所述初始图像特征的特征顺序,得到第一初始图像特征集合和第二初始图像特征集合;
    基于所述第一初始图像特征集合重排得到第一特征矩阵,以及基于所述第二初始图像特征集合重排得到第二特征矩阵;
    对所述第一特征矩阵中的所述初始图像特征进行特征组合,基于特征组合结果生成所述第一图像特征集合;
    对所述第二特征矩阵中的所述初始图像特征进行特征组合,基于特征组合结果生成所述第二图像特征集合;
    其中,所述第一特征矩阵与所述第二特征矩阵中所述初始图像特征的重排方式不同,和/或,所述第一图像特征集合与所述第二图像特征集合中所述初始图像特征的特征组合方式不同。
  8. 根据权利要求1至7任一所述的方法,其中,所述基于所述第一图像特征集合和所述第二图像特征集合预训练图像分类模型,包括:
    将所述第一图像特征集合输入所述图像分类模型的在线学习分支,得到第一分类结果;
    将所述第二图像特征集合输入所述图像分类模型的目标学习分支,得到第二分类结果;
    基于所述第一分类结果与所述第二分类结果训练所述在线学习分支;
    基于训练后所述在线学习分支的模型参数,更新所述目标学习分支的模型参数。
  9. 根据权利要求8所述的方法,其中,所述基于所述第一分类结果与所述第二分类结果训练所述在线学习分支,包括:
    确定所述第一分类结果与所述第二分类结果的相似度损失;
    基于所述相似度损失,通过反向传播更新所述在线学习分支的模型参数。
  10. 根据权利要求8所述的方法,其中,所述基于训练后所述在线学习分支的模型参数,更新所述目标学习分支的模型参数,包括:
    基于训练后所述在线学习分支的模型参数,对所述目标学习分支的模型参数进行EMA更新。
  11. 根据权利要求8所述的方法,其中,所述基于第二样本图像对预训练后的所述图像分类模型进行微调,包括:
    基于所述第二样本图像对所述图像分类模型中所述目标学习分支的模型参数进行微调。
  12. 根据权利要求11所述的方法,其中,所述基于所述第二样本图像对所述图像分类模型中所述目标学习分支的模型参数进行微调,包括:
    将所述第二样本图像输入所述图像分类模型的所述目标学习分支,得到样本分类结果;
    基于所述样本分类结果以及所述第二样本图像对应的样本分类标注,通过反向传播微调所述目标学习分支的模型参数。
  13. 根据权利要求1至7任一所述的方法,其中,所述图像分类模型为ViT模型。
  14. 一种图像分类装置,所述装置包括:
    图像分割模块,用于对第一样本图像进行图像分割,以及对分割得到的各个图像块进行特征提取,得到初始图像特征集合,所述初始图像特征集合中包含各个图像块对应的初始图像特征,所述第一样本图像是未经过标注的样本图像;
    重排组合模块,用于对所述初始图像特征集合中的所述初始图像特征进行重排组合,得到第一图像特征集合和第二图像特征集合,所述第一图像特征集合中的第一图像特征与所述第二图像特征集合中的第二图像特征对应不同重排组合方式;
    预训练模块,用于基于所述第一图像特征集合和所述第二图像特征集合预训练图像分类模型,所述图像分类模型用于对图像中的内容进行分类;
    微调模块,用于基于第二样本图像对预训练后的所述图像分类模型进行微调,所述第二样本图像是经过标注的样本图像。
  15. 根据权利要求14所述的装置,其中,所述重排组合模块,还用于:
    调整所述初始图像特征集合中所述初始图像特征的特征顺序,得到第一初始图像特征集合和第二初始图像特征集合,所述第一初始图像特征集合和所述第二初始图像特征集合中所述初始图像特征的顺序不同;
    基于所述第一初始图像特征集合重排得到第一特征矩阵,以及基于所述第二初始图像特征集合重排得到第二特征矩阵;
    对所述第一特征矩阵中的所述初始图像特征进行特征组合,基于特征组合结果生成所述第一图像特征集合;
    对所述第二特征矩阵中的所述初始图像特征进行特征组合,基于特征组合结果生成所述第二图像特征集合。
  16. 根据权利要求15所述的装置,其中,所述重排组合模块,还用于:
    通过滑窗选取所述第一特征矩阵中相邻的n个初始图像特征;
    对所述n个初始图像特征进行特征组合,得到第一组合图像特征;
    对m组所述第一组合图像特征进行线性映射,得到所述第一图像特征集合,m组所述第一组合图像特征通过移动所述滑窗得到;
    通过滑窗选取所述第二特征矩阵中相邻的n个初始图像特征;
    对所述n个初始图像特征进行特征组合,得到第二组合图像特征;
    对m组所述第二组合图像特征进行线性映射,得到所述第二图像特征集合,m组所述第二组合图像特征通过移动所述滑窗得到。
  17. 根据权利要求16所述的装置,其中,所述重排组合模块,还用于:
    对所述n个初始图像特征进行特征拼接,得到所述第一组合图像特征,或,对所述n个初始图像特征进行特征融合,得到所述第一组合图像特征;
    对所述n个初始图像特征进行特征拼接,得到所述第二组合图像特征,或,对所述n个初始图像特征进行特征融合,得到所述第二组合图像特征。
  18. 一种计算机设备,所述计算机设备包括处理器和存储器,所述存储器中存储有至少一段程序,所述至少一段程序由所述处理器加载并执行以实现如权利要求1至13任一所述的图像分类方法。
  19. 一种计算机可读存储介质,所述可读存储介质中存储有至少一段程序,所述至少一段程序由处理器加载并执行以实现如权利要求1至13任一所述的图像分类方法。
  20. 一种计算机程序产品,所述计算机程序产品包括计算机指令,所述计算机指令存储在计算机可读存储介质中;计算机设备的处理器从所述计算机可读存储介质读取所述计算机指令,所述处理器执行所述计算机指令,使得所述计算机设备执行如权利要求1至13任一所述的图像分类方法。
PCT/CN2022/093376 2021-06-29 2022-05-17 图像分类方法、装置、设备、存储介质及程序产品 WO2023273668A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP22831494.4A EP4235488A4 (en) 2021-06-29 2022-05-17 IMAGE CLASSIFICATION METHOD AND APPARATUS, APPARATUS, STORAGE MEDIUM AND PROGRAM PRODUCT
US18/072,337 US20230092619A1 (en) 2021-06-29 2022-11-30 Image classification method and apparatus, device, storage medium, and program product

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110723873.5A CN113177616B (zh) 2021-06-29 2021-06-29 图像分类方法、装置、设备及存储介质
CN202110723873.5 2021-06-29

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/072,337 Continuation US20230092619A1 (en) 2021-06-29 2022-11-30 Image classification method and apparatus, device, storage medium, and program product

Publications (1)

Publication Number Publication Date
WO2023273668A1 true WO2023273668A1 (zh) 2023-01-05

Family

ID=76927873

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/093376 WO2023273668A1 (zh) 2021-06-29 2022-05-17 图像分类方法、装置、设备、存储介质及程序产品

Country Status (4)

Country Link
US (1) US20230092619A1 (zh)
EP (1) EP4235488A4 (zh)
CN (1) CN113177616B (zh)
WO (1) WO2023273668A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116912621A (zh) * 2023-07-14 2023-10-20 浙江大华技术股份有限公司 图像样本构建方法、目标识别模型的训练方法及相关装置

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113177616B (zh) * 2021-06-29 2021-09-17 腾讯科技(深圳)有限公司 图像分类方法、装置、设备及存储介质
CN113627597B (zh) * 2021-08-12 2023-10-13 上海大学 一种基于通用扰动的对抗样本生成方法及系统
CN117036788B (zh) * 2023-07-21 2024-04-02 阿里巴巴达摩院(杭州)科技有限公司 图像分类方法、训练图像分类模型的方法及装置
CN116821398B (zh) * 2023-08-14 2023-11-10 新唐信通(浙江)科技有限公司 一种道路缺陷识别模型训练用数据集获取方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109948671A (zh) * 2019-03-04 2019-06-28 腾讯科技(深圳)有限公司 图像分类方法、装置、存储介质以及内窥镜成像设备
CN111126481A (zh) * 2019-12-20 2020-05-08 湖南千视通信息科技有限公司 一种神经网络模型的训练方法及装置
CN111242217A (zh) * 2020-01-13 2020-06-05 支付宝实验室(新加坡)有限公司 图像识别模型的训练方法、装置、电子设备及存储介质
CN113177616A (zh) * 2021-06-29 2021-07-27 腾讯科技(深圳)有限公司 图像分类方法、装置、设备及存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110766038B (zh) * 2019-09-02 2022-08-16 深圳中科保泰空天技术有限公司 无监督式的地貌分类模型训练和地貌图构建方法
CN110909803B (zh) * 2019-11-26 2023-04-18 腾讯科技(深圳)有限公司 图像识别模型训练方法、装置和计算机可读存储介质
CN111898696B (zh) * 2020-08-10 2023-10-27 腾讯云计算(长沙)有限责任公司 伪标签及标签预测模型的生成方法、装置、介质及设备
CN112836762A (zh) * 2021-02-26 2021-05-25 平安科技(深圳)有限公司 模型蒸馏方法、装置、设备及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109948671A (zh) * 2019-03-04 2019-06-28 腾讯科技(深圳)有限公司 图像分类方法、装置、存储介质以及内窥镜成像设备
CN111126481A (zh) * 2019-12-20 2020-05-08 湖南千视通信息科技有限公司 一种神经网络模型的训练方法及装置
CN111242217A (zh) * 2020-01-13 2020-06-05 支付宝实验室(新加坡)有限公司 图像识别模型的训练方法、装置、电子设备及存储介质
CN113177616A (zh) * 2021-06-29 2021-07-27 腾讯科技(深圳)有限公司 图像分类方法、装置、设备及存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4235488A4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116912621A (zh) * 2023-07-14 2023-10-20 浙江大华技术股份有限公司 图像样本构建方法、目标识别模型的训练方法及相关装置
CN116912621B (zh) * 2023-07-14 2024-02-20 浙江大华技术股份有限公司 图像样本构建方法、目标识别模型的训练方法及相关装置

Also Published As

Publication number Publication date
CN113177616B (zh) 2021-09-17
CN113177616A (zh) 2021-07-27
EP4235488A1 (en) 2023-08-30
EP4235488A4 (en) 2024-05-29
US20230092619A1 (en) 2023-03-23

Similar Documents

Publication Publication Date Title
WO2023273668A1 (zh) 图像分类方法、装置、设备、存储介质及程序产品
Pérez-Rúa et al. Mfas: Multimodal fusion architecture search
JP7208408B2 (ja) 検出モデルのトレーニング方法、装置、コンピュータデバイス及びコンピュータプログラム
CN111709409B (zh) 人脸活体检测方法、装置、设备及介质
CN111754596B (zh) 编辑模型生成、人脸图像编辑方法、装置、设备及介质
WO2020156245A1 (zh) 动作识别方法、装置、设备及存储介质
US10535141B2 (en) Differentiable jaccard loss approximation for training an artificial neural network
CN114398961B (zh) 一种基于多模态深度特征融合的视觉问答方法及其模型
EP4002161A1 (en) Image retrieval method and apparatus, storage medium, and device
WO2020108336A1 (zh) 图像处理方法、装置、设备及存储介质
WO2023000872A1 (zh) 图像特征的监督学习方法、装置、设备及存储介质
US20220237917A1 (en) Video comparison method and apparatus, computer device, and storage medium
CN115565238B (zh) 换脸模型的训练方法、装置、设备、存储介质和程序产品
CN113821668A (zh) 数据分类识别方法、装置、设备及可读存储介质
WO2023165361A1 (zh) 一种数据处理方法及相关设备
CN111091010A (zh) 相似度确定、网络训练、查找方法及装置和存储介质
US20220327835A1 (en) Video processing method and apparatus
CN113657272B (zh) 一种基于缺失数据补全的微视频分类方法及系统
CN114330514A (zh) 一种基于深度特征与梯度信息的数据重建方法及系统
WO2023160157A1 (zh) 三维医学图像的识别方法、装置、设备、存储介质及产品
CN112862840B (zh) 图像分割方法、装置、设备及介质
Zhong A convolutional neural network based online teaching method using edge-cloud computing platform
KR102340387B1 (ko) 뇌 연결성 학습 방법 및 이를 위한 시스템
CN116648700A (zh) 音频数据转录训练学习算法标识图像数据中可见医疗设备
Xiao et al. Gaze prediction based on long short-term memory convolution with associated features of video frames

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2022831494

Country of ref document: EP

Effective date: 20230526

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22831494

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE