CN112989085B

CN112989085B - Image processing method, device, computer equipment and storage medium

Info

Publication number: CN112989085B
Application number: CN202110134769.2A
Authority: CN
Inventors: 杨思骞; 付彦伟; 刘宸; 张力; 汪铖杰; 徐程明; 李季檩; 黄飞跃
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-01-29
Filing date: 2021-01-29
Publication date: 2023-07-25
Anticipated expiration: 2041-01-29
Also published as: CN112989085A

Abstract

The application discloses an image processing method, an image processing device, computer equipment and a storage medium, and belongs to the technical field of computers. According to the method and the device, the plurality of sample image pairs are constructed, two sample images in each sample image pair are guaranteed to belong to different category labels, the target areas in the same positions in each sample image pair are exchanged, a new target image pair after data enhancement can be constructed, the original sample images can be subjected to data enhancement in a small sample learning scene in a mode based on image information exchange, two category labels are carried in each generated target image, and therefore generalization capability of an image classification model based on small sample learning can be remarkably improved, namely classification accuracy of the image classification model is improved.

Description

Image processing method, device, computer equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to an image processing method, an image processing device, a computer device, and a storage medium.

Background

With the development of computer technology and artificial intelligence (Artificial Intelligence, AI) technology, small sample Learning (Few-Shot Learning) is gradually one of the development directions of AI in the future. The small sample learning generally refers to a method and a scene for learning from a small amount of annotation data, including an optimization-based algorithm, a matrix learning-based algorithm, and the like. In the image classification task, the number of samples of each category in the training set in the small sample learning scene is small, and the test set also comprises new categories which never appear in the training set, so that the generalization capability of the image classification model based on the small sample learning is poor, and how to improve the recognition accuracy of the image classification model based on the small sample learning on the new categories becomes a problem to be solved gradually.

Disclosure of Invention

The embodiment of the application provides an image processing method, an image processing device, computer equipment and a storage medium, which can improve the image classification accuracy of an image classification model in a small sample scene. The technical scheme is as follows:

in one aspect, there is provided an image processing method, the method including:

obtaining a plurality of sample image pairs, any one of the plurality of sample image pairs comprising a first sample image and a second sample image, the first sample image and the second sample image belonging to different class labels;

for any sample image pair, exchanging a first target area in the first sample image with a second target area in the second sample image to obtain a target image pair, wherein the positions of the first target area and the second target area correspond to each other;

and adjusting parameters of an initial classification model based on a plurality of target image pairs to obtain an image classification model, wherein the image classification model is used for identifying class labels to which the input images belong.

In one aspect, there is provided an image processing apparatus including:

an acquisition module, configured to acquire a plurality of sample image pairs, where any one of the plurality of sample image pairs includes a first sample image and a second sample image, and the first sample image and the second sample image belong to different class labels;

The exchanging module is used for exchanging a first target area in the first sample image with a second target area in the second sample image for any sample image pair to obtain a target image pair, and the positions of the first target area and the second target area correspond to each other;

the adjustment module is used for adjusting parameters of the initial classification model based on a plurality of target image pairs to obtain an image classification model, and the image classification model is used for identifying class labels to which the input image belongs.

In a possible implementation, the target image pair includes a first target image and a second target image, and the exchanging module is configured to:

cropping the first target region from the first sample image;

cropping the second target region from the second sample image;

filling the second target area to a corresponding position in the cut first sample image to obtain the first target image;

and filling the first target area to a corresponding position in the cut second sample image to obtain the second target image.

In one possible embodiment, the adjustment module includes:

The blocking unit is used for carrying out image blocking on a plurality of target image pairs to obtain a plurality of target image block pairs;

the acquisition unit is used for acquiring category labels to which the target image block pairs belong respectively;

and the adjusting unit is used for adjusting parameters of the initial classification model based on the plurality of target image block pairs and class labels to which the plurality of target image block pairs belong to so as to obtain the image classification model.

In one possible implementation, the class label to which any one of the target image blocks in any one of the target image block pairs belongs is the same as the class label of the sample image in which the any one of the target image blocks originally exists.

In one possible embodiment, the adjusting unit includes:

a screening subunit, configured to screen from the plurality of target image block pairs to obtain a plurality of test image blocks;

the first obtaining subunit is used for calling the initial classification model to obtain the prediction category to which the plurality of test image blocks belong;

a second obtaining subunit, configured to obtain a loss function value based on a prediction category to which the plurality of test image blocks belong and a category label to which the plurality of test image blocks belong;

And the iteration subunit is used for carrying out the step of obtaining the loss function value in an iteration mode until the loss function value meets a stop condition, stopping iteration and obtaining the image classification model.

In one possible embodiment, the screening subunit is configured to:

randomly selecting a plurality of target categories from a plurality of category labels to be selected;

randomly selecting a plurality of reference image blocks belonging to the plurality of target categories from the plurality of target image block pairs, wherein different target categories in the plurality of target categories correspond to the same number of reference image blocks;

the plurality of test image blocks belonging to the plurality of target categories, excluding the plurality of reference image blocks, are randomly selected from the plurality of target image block pairs.

In one possible implementation, the first acquisition subunit is configured to:

invoking the initial classification model to extract image features of the plurality of reference image blocks and image features of the plurality of test image blocks respectively;

and acquiring prediction categories to which the plurality of test image blocks belong based on correlation between the image features of the plurality of reference image blocks and the image features of the plurality of test image blocks.

In one aspect, a computer device is provided that includes one or more processors and one or more memories having at least one computer program stored therein, the at least one computer program being loaded and executed by the one or more processors to implement an image processing method as described above.

In one aspect, a storage medium is provided in which at least one computer program is stored, the at least one computer program being loaded and executed by a processor to implement an image processing method as described above.

In one aspect, a computer program product or computer program is provided, the computer program product or computer program comprising one or more program codes, the one or more program codes being stored in a computer readable storage medium. The one or more processors of the computer device are capable of reading the one or more program codes from the computer-readable storage medium, and executing the one or more program codes, so that the computer device can perform the above-described image processing method.

The beneficial effects that technical scheme that this application embodiment provided include at least:

By constructing a plurality of sample image pairs, two sample images in each sample image pair are guaranteed to belong to different category labels, then a new target image pair after data enhancement can be constructed by exchanging target areas in the same position in each sample image pair, and data enhancement can be carried out on original sample images in a small sample learning scene in a mode based on image information exchange, so that two category labels are carried in each generated target image, the generalization capability of an image classification model based on small sample learning can be remarkably improved, and the classification accuracy of the image classification model is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic view of an implementation environment of an image processing method according to an embodiment of the present application;

FIG. 2 is a flowchart of an image processing method according to an embodiment of the present application;

FIG. 3 is a schematic flow chart of an image information exchange method according to an embodiment of the present application;

FIG. 4 is a training flow chart of an image classification model provided in an embodiment of the present application;

fig. 5 is a schematic structural view of an image processing apparatus according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

The terms "first," "second," and the like in this application are used to distinguish between identical or similar items that have substantially the same function and function, and it should be understood that there is no logical or chronological dependency between the "first," "second," and "nth" terms, nor is it limited to the number or order of execution.

The term "at least one" in this application means one or more, meaning "a plurality of" means two or more, for example, a plurality of first positions means two or more first positions.

Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises audio processing technology, computer vision technology, natural language processing technology, machine learning/deep learning and other directions.

The Machine Learning (ML) is a multi-domain interdisciplinary, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.

With the research and progress of artificial intelligence technology, artificial intelligence has been developed and applied in various fields, in which Computer Vision technology (CV) is a science of how to make a machine "look" and further, it means that a camera and a Computer replace human eyes to perform machine Vision such as recognition, tracking and measurement on a target, and further perform graphic processing, so that the Computer processing becomes an image more suitable for human eyes to observe or transmit to an instrument to detect. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision techniques typically include image processing, image recognition, image semantic understanding, image retrieval, OCR (Optical Character Recognition ), video processing, video semantic understanding, video content/behavior recognition, three-Dimensional object reconstruction, 3D (three-Dimensional) techniques, virtual reality, augmented reality, synchronous positioning, and map construction, and the like, as well as common biometric recognition techniques such as face recognition, fingerprint recognition, and the like.

The embodiment of the application relates to application of a machine learning technology in the field of image classification, namely, how to adopt a machine learning model to classify an input image, so that a class label to which the input image belongs can be intelligently identified, namely, the task of classifying the input image is completed. Optionally, small sample Learning (Few-Shot Learning) in the field of image classification: the method and the scene for learning from a small amount of annotation data are generally referred to as optimization-based algorithms, matrix-learning-based algorithms and the like, and ideally, a model capable of learning with a small sample can be rapidly applied to a new field. When small sample learning is discussed, it is generally focused on certain specific problems, such as when a training pattern of small sample learning is applied to an image classification problem, this task is referred to as a small sample classification (Few-Shot Classification) task. In the following, some basic concepts of small sample learning will be described.

Humans are very good at identifying a new object through a very small number of samples, for example, a child can know what is a 'zebra' and what is a 'Rhinoceros' only by needing some pictures in a book, under the inspired of the fast learning ability of humans, a technician hopes that a machine learning model can learn fast with a small number of samples for a new class after learning sample data of a certain class, which is a problem to be solved by small sample learning, namely: on the premise of giving a small number of samples, the recognition accuracy of the machine learning model to the new category is improved as much as possible, so that the generalization capability of the machine learning model is improved.

The small sample Learning is the application of Meta Learning (Meta Learning) in the field of supervision Learning, in which a data set is decomposed into different Meta tasks in a Meta Learning stage to learn the generalization capability of a model under the condition of category change, and in a Meta test stage, classification can be completed without changing the existing model in the face of a brand new category.

In one exemplary scenario, it is assumed that a training dataset for small sample learning contains M class labels, each containing multiple samples. In the training stage, C (C is less than or equal to M) class labels are randomly extracted from M class labels in a training data Set, K samples (C is less than or equal to K samples in total) are randomly selected under each class label in the C class labels, and a meta task is built and is used as a Support Set (Support Set) input of a model; and randomly extracting a Batch (Batch) of samples from the rest samples in the C class labels to be used as a prediction object (Query Set, namely a test Set) of the model. Each time the above steps are performed, called model training (epoode), each time the model is required to learn how to distinguish the C class labels from the C x K samples, such a task is called the C-way K-shot (class C K samples) problem.

In view of this, the embodiment of the application provides an image processing method, which performs contrast learning by constructing a Support Set (Support Set) and a test Set (Query Set), and fuses features of the Support Set and the test Set in a loss function, so that a model can simultaneously consider different feature distributions on two sample sets.

Fig. 1 is a schematic view of an implementation environment of an image processing method according to an embodiment of the present application. Referring to fig. 1, the implementation environment includes a terminal 101 and a server 102.

The terminal 101 installs and runs applications supporting image classification services, including but not limited to: browser applications, social applications, ordering applications, payment applications, taxi taking applications, image processing applications, short video applications, and the like.

The terminal 101 may be directly or indirectly connected to the server 102 through a wired or wireless communication manner, and the connection manner is not limited herein in the embodiments of the present application.

The server 102 is configured to provide background services for the application programs, where the server 102 includes at least one of a server, a plurality of servers, a cloud computing platform, or a virtualization center. Optionally, the server 102 takes on primary computing work and the terminal 101 takes on secondary computing work; alternatively, the server 102 takes on secondary computing work and the terminal 101 takes on primary computing work; alternatively, a distributed computing architecture is employed between both the server 102 and the terminal 101 for collaborative computing.

In an exemplary scenario, the server 102 pre-trains a large amount of data to obtain an initial classification model, and based on the image processing method in the embodiment of the application, parameters of the initial classification model are adjusted to obtain a final image classification model, and the image classification model can be suitable for various image classification tasks with unbalanced sample distribution, for example, classification tasks of character color types in face images, black color samples are many, but other color samples are relatively few.

In some embodiments, a user logs in the application program by using the terminal 101, uploads an image to be identified to the server 102, the server 102 invokes the image classification model to classify the image to be identified at the cloud to obtain a prediction category to which the image to be identified belongs, and returns the prediction category output by the model to the terminal 101, so that it is ensured that even if only a small number of samples and labels (for example, the samples and labels of a new category are usually very few) exist in the category labels to which the image to be identified truly belongs, a high image classification accuracy can be maintained.

In other embodiments, after the server 102 trains to obtain the image classification model, the image classification model is compressed, the compressed image classification model is issued to the terminal 101, and the terminal 101 directly performs the image classification task locally by using the image classification model, so that communication interaction is not required each time the image classification task is performed, and communication overhead between the terminal 101 and the server 102 can be saved.

In some embodiments, server 102 is a stand-alone physical server, or a server cluster or distributed system of multiple physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, CDNs (Content Delivery Network, content delivery networks), and basic cloud computing services such as big data and artificial intelligence platforms.

In some embodiments, the terminal 101 is a smart phone, tablet, notebook, desktop, smart box, smart watch, MP3 (Moving Picture Experts Group Audio Layer III, moving picture experts compression standard audio layer 3) player, MP4 (Moving Picture Experts Group Audio Layer IV, moving picture experts compression standard audio layer 4) player, e-book reader, etc., but is not limited thereto.

Those skilled in the art will appreciate that the number of terminals 101 may be greater or lesser. For example, the number of the terminals 101 may be only one, or the number of the terminals 101 may be several tens or hundreds, or more. The number and device types of the terminals 101 are not limited in the embodiment of the present application.

In some embodiments, as disclosed in the embodiments of the present application, the initial classification model may be iteratively trained by the server 102 to obtain a final image classification model, where the image classification model may be stored in the server 102, and the server 102 processes the image classification request of the terminal 101 in real time, or the server 102 may compress the image classification model and then send the compressed image classification model to the terminal 101, and the terminal 101 invokes the image classification model locally to perform the image classification operation. It should be noted that, in the case where the server 102 is a distributed storage system or a server cluster formed by a plurality of servers, the distributed storage system or the server cluster may be formed as a blockchain system, and each server in the distributed storage system or the server cluster is a node device in the blockchain system.

Fig. 2 is a flowchart of an image processing method according to an embodiment of the present application. Referring to fig. 2, the embodiment is applied to a computer device, and the computer device is taken as a server for explanation, and the embodiment includes the following steps:

201. the server obtains a plurality of sample image pairs, any of the plurality of sample image pairs including a first sample image and a second sample image, the first sample image and the second sample image belonging to different class labels.

In some embodiments, the server constructs an original sample set from all sample images, randomly selects two sample images belonging to different types of labels from the original sample set, so as to obtain a sample image pair, and repeatedly performs the above operations for multiple times, so as to obtain multiple sample image pairs.

Alternatively, for each of the plurality of sample image pairs, one sample image is referred to as a first sample image and the other sample image is referred to as a second sample image, and it is necessary to ensure that the first sample image and the second sample image belong to different class labels.

202. The server exchanges a first target area in the first sample image with a second target area in the second sample image for any sample image pair to obtain a target image pair, and the positions of the first target area and the second target area correspond to each other.

Wherein the target image pair comprises a first target image and a second target image.

In some embodiments, the server cuts the first target area from the first sample image, cuts the second target area from the second sample image, fills the second target area to a corresponding position in the cut first sample image to obtain the first target image, fills the first target area to a corresponding position in the cut second sample image to obtain the second target image, and determines the first target image and the second target image as a target image pair.

In the above process, the server performs the above operation on each sample image pair in the plurality of sample image pairs, that is, the two sample images in each sample image pair exchange partial areas with the same positions, so that different areas carry different kinds of labels in any one finished target image, and on the basis of the existing samples, difficult samples with higher difficulty are constructed in an image information exchange manner, which is favorable for improving the effect of contrast learning.

Fig. 3 is a schematic flowchart of an image information exchange manner provided in this embodiment of the present application, please refer to fig. 3, and assume that a sample image pair includes a first sample image 301 and a second sample image 302, an object class of the first sample image 301 is an automobile, an object class of the second sample image 302 is an hourglass, square areas at lower left corners of the first sample image 301 and the second sample image 302 are respectively determined as a first target area 3011 and a second target area 3021, then the first target area 3011 is cut out in the first sample image 301, and the second target area 3021 is filled in a position where the original first target area 3011 is located, so as to obtain a first target image 303, and it can be seen that, in the first target image 303, class labels carried by the second target area 3021 are hourglass, and class labels carried by other areas except the second target area 3021 are automobiles.

In some embodiments, the server randomly selects one or more regions from the first sample image based on a Cutmix (Cutmix) manner as a first target region, and uses one or more regions corresponding to the first target region position in the second sample image as a second target region according to the position of the first target region, which is not specifically limited in this embodiment of the present application.

In some embodiments, in addition to randomly selecting the first target area, the server may sample the first sample image based on the β distribution to obtain a sampling center point, and generate the width (W) and the height (H) of the first target area by using the sampling center point generated by the β distribution as the upper left corner coordinate of the first target area and the weighting coefficient λ generated by the β distribution, so that the area ratio between the first target area and the first sample image is 1- λ.

In some embodiments, for the selection manner of the first target area, the server may further segment objects with different semantics in the first sample image by using a semantic segmentation model, and then determine an area where any one object is located as the first target area.

In some embodiments, the server ensures that the first sample image and the second sample image are the same size, e.g., in step 201 described above, only sample images of the same size and different class labels are selected to generate a sample image pair, and, e.g., even if the first sample image and the second sample image in any of the sample image pairs obtained in step 201 described above are different in size, the first sample image and the second sample image can be converted to the same size by up-sampling, down-sampling, size scaling, etc.

In the above process, by ensuring that the first sample image and the second sample image have the same size, then the positions of the first target area and the second target area correspond to each other, so that the sizes of the first target area and the second target area are indirectly ensured to be the same, the first target area and the second target area are conveniently and directly exchanged, and the process of constructing the target image pair is simplified.

In some embodiments, the first and second sample images have different sizes, but by ensuring that the first and second target areas have the same size, the first and second target areas can also be directly swapped, simplifying the process of constructing the target image pair.

In some embodiments, the first sample image and the second sample image have different sizes, and the first target region and the second target region also have different sizes, and at this time, by ensuring that the first target region and the second target region have the same shape, the first target region or the second target region can be subjected to operations of size scaling, upsampling, downsampling, and the like, and converted to the same size, and then refilled to the second sample image or the corresponding region in the first sample image, so that the types of the target image pairs can be further enriched.

203. The server adjusts parameters of the initial classification model based on a plurality of target image pairs to obtain an image classification model, wherein the image classification model is used for identifying class labels to which the input image belongs.

In some embodiments, the server performs image blocking on a plurality of the target image pairs to obtain a plurality of target image block pairs; acquiring class labels to which the target image block pairs belong respectively; and adjusting parameters of the initial classification model based on the plurality of target image block pairs and class labels to which the plurality of target image block pairs belong to, so as to obtain the image classification model.

Optionally, the server randomly pairs a plurality of sample image pairs from the original sample set, and performs data enhancement on the basis of the plurality of sample image pairs by using an image information exchange mode, so as to generate a plurality of target image pairs. Since each of the plurality of target image pairs contains two target images (a first target image and a second target image), the plurality of target image pairs, that is, the enhanced sample set formed by the plurality of target images, that is, the plurality of target image pairs, can be regarded as the enhanced sample set constructed on the original sample set, and each target image contained in the enhanced sample set carries two class labels, thus having greater classification difficulty.

Optionally, for each of the plurality of target image pairs, the server performs image blocking (Patch) on the first target image in each target image pair to obtain a plurality of first target image blocks, performs image blocking on the second target image in each target image pair to obtain a plurality of second target image blocks, determines the plurality of first target image blocks and the plurality of second target image blocks as a target image block pair, and repeatedly performs the blocking operation multiple times until all target image pairs are traversed, so as to obtain the plurality of target image block pairs.

In some embodiments, in the above image blocking process, it is required to ensure that the cut-out target image block corresponds to only a single class label, where the cut-out target image block has a single class label means that the target image block does not include the content of the clipped sample image and the content of the filled target area, that is, either all the target image block is from the clipped sample image or all the target image block is from the filled target area (of course, the target area is also the area clipped from another sample image).

In some embodiments, for each of the plurality of target image block pairs, each target image block pair includes two target image blocks (a first target image block and a second target image block), the server sets a class label of a sample image in which each target image block is originally located as a class label to which each target image block itself belongs. That is, the category label to which any one of the target image blocks in any one of the target image block pairs belongs is the same as the category label of the sample image to which the any one of the target image blocks originally belongs.

In the process, each difficult sample (target image) in the enhanced sample set is segmented, and the segmented target image blocks are used for being put into model training, so that the local understanding capability of the model on the image can be enhanced, each region of the input image can be better understood by the model, and the accuracy of the overall image classification can be improved.

In some embodiments, the server performs the following operations while training: screening a plurality of test image blocks from the plurality of target image block pairs; invoking the initial classification model to obtain the prediction category to which the plurality of test image blocks belong; acquiring a loss function value based on the prediction category to which the plurality of test image blocks belong and the category labels to which the plurality of test image blocks belong; and iteratively executing the step of obtaining the loss function value until the loss function value meets a stop condition, and stopping iteration to obtain the image classification model.

In the above process, the server screens out test sets for performing model tests from the segmented enhanced sample set, puts each test image block in the test sets into the training process, and calculates the loss function value based on the prediction category and the true category of each test image block.

In some embodiments, the server performs the following operations in screening the test set: randomly selecting a plurality of target categories from a plurality of image categories to be selected; randomly selecting a plurality of reference image blocks belonging to the plurality of target categories from the plurality of target image block pairs, wherein different target categories in the plurality of target categories correspond to the same number of reference image blocks; the plurality of test image blocks belonging to the plurality of target categories, excluding the plurality of reference image blocks, are randomly selected from the plurality of target image block pairs.

In the above process, corresponding to the enhancement sample Set after the server is segmented, C (C is greater than or equal to 2) target classes are screened first, K (K is greater than or equal to 1) reference image blocks are selected randomly for each of the C classes, the finally screened C reference image blocks are synthesized to construct a Support Set (Support Set) for training, then a plurality of test image blocks are selected randomly from the remaining target image blocks in the C classes, and a test Set (Query Set) for testing is constructed, so that it is known that the Support Set and the test Set do not have the same sample, that is, no intersection exists between the Support Set and the test Set. The model can give consideration to the feature distribution of the target image block in two different sample sets by comparing and learning the comprehensive support set and the test set, so that the classification accuracy of the model is improved.

In some embodiments, the server performs the following when obtaining the prediction category: invoking the initial classification model to extract image features of the plurality of reference image blocks and image features of the plurality of test image blocks respectively; and acquiring prediction categories to which the plurality of test image blocks belong based on correlation between the image features of the plurality of reference image blocks and the image features of the plurality of test image blocks.

Optionally, for any one of the plurality of test image blocks, the server acquires a correlation between the image feature of the any one test image block and the image feature of each reference image block, and determines a category label to which the reference image block with the largest correlation belongs as a prediction category of the any one test image block.

In some embodiments, the correlation may take various calculation manners, for example, the correlation is an inner product between two image features, or the correlation is a cosine distance between two image features, or the correlation is inversely correlated with a euclidean distance between two image features, which is not specifically limited in this embodiment of the present application.

In one example, consider the correlation as the inner product between two image features, assuming that for the ith test image block in test set q Invoking the initial classification model to extract +.>Is expressed as +.>Then the server gets +.>Image characteristics of each reference image block in the support set s>The inner product (j is larger than or equal to 1 and smaller than or equal to the sample capacity of the support set) between the two reference image blocks, and selecting the class label to which the reference image block with the largest inner product belongs as an initial classification model pair +.>Is of predictive category j of (1) ^* ，j ^* The expression of (2) is as follows:

on the basis of the above, the loss function L of the initial classification model in iteration _iwh The following formula is shown, where the subscript iwh of L indicates a test image block with width w and height h that is cut out for the i-th target image:

wherein e is a natural base number,representing the identification of the resulting prediction category for the ith, wide w, high h test image block iwh in test set q, +.>Class label representing reference image block j of support set s having the same class label as the predicted class of test image block iwh, +.>Representing the extracted image features of the test image block iwh,>representing the extracted image features for reference image block j, a +.>Class label representing reference image block k of support set s having a different class label from the predicted class of test image block iwh, +. >Representing the image features extracted for the reference image block k.

In some embodiments, the server may not block the image of each target image pair, that is, directly adjust parameters of the initial classification model based on the plurality of target image pairs and class labels to which the plurality of target image pairs belong, to obtain the image classification model, and a training process based on the whole image is similar to the training process based on the segmented image, which is not described herein, so that a training process of the model can be simplified.

Without image blocking, the initial classification model pairs the ith test image in test set qIs a loss function L of (2) _i The following formula is shown:

wherein e is a natural base number,representing the prediction category obtained by identifying the test image i in the test set q,/i>Class label of reference image j representing the support set s with the same class label as the predicted class of test image i, +.>Representing the image features extracted for test image i, < >>Representing the extracted image features of reference image j, < >>Class label of reference image k representing a different class label from the predicted class of test image i in support set s, +. >Representing the image features extracted for the reference image k.

It should be noted that the image classification model may have various types of architectures (backbones), and the image classification model includes, but is not limited to: convolutional neural networks (Convolutional Neural Networks, CNN), residual networks (Residual Networks, resNet), breadth residual networks (Wide Residual Networks, WRN), and the like, the architecture of the image classification model is not specifically limited by the embodiments of the present application.

All the above optional solutions can be combined to form an optional embodiment of the present disclosure, which is not described in detail herein.

According to the method provided by the embodiment of the invention, by constructing a plurality of sample image pairs, two sample images in each sample image pair are guaranteed to belong to different category labels, and then by exchanging target areas in the same position in each sample image pair, a new target image pair after data enhancement can be constructed, and the original sample image can be subjected to data enhancement in a small sample learning scene in a mode based on image information exchange, so that each generated target image carries two category labels, the generalization capability of an image classification model based on small sample learning can be remarkably improved, and the classification accuracy of the image classification model is improved.

Fig. 4 is a training flow chart of an image classification model provided in an embodiment of the present application, please refer to fig. 4, the training flow chart of the image classification model is applied to a server, and the training flow chart is as follows:

401. the server obtains a plurality of sample image pairs, any of the plurality of sample image pairs including a first sample image and a second sample image, the first sample image and the second sample image belonging to different class labels.

Step 401 is similar to step 201, and will not be described again.

402. The server exchanges a first target area in the first sample image with a second target area in the second sample image for any sample image pair to obtain a target image pair, and the positions of the first target area and the second target area correspond to each other.

Optionally, the target image pair comprises a first target image and a second target image.

In some embodiments, the server clips the first target area from the first sample image; cropping the second target region from the second sample image; filling the second target area to a corresponding position in the cut first sample image to obtain the first target image; and filling the first target area to a corresponding position in the cut second sample image to obtain the second target image.

Step 402 is similar to step 202, and will not be described again.

403. The server performs image blocking on a plurality of target image pairs to obtain a plurality of target image block pairs.

In the above process, the server randomly pairs a plurality of sample image pairs from the original sample set, and performs data enhancement on the basis of the plurality of sample image pairs by using an image information exchange mode, so as to generate a plurality of target image pairs. Since each of the plurality of target image pairs contains two target images (a first target image and a second target image), the plurality of target image pairs, that is, the enhanced sample set formed by the plurality of target images, that is, the plurality of target image pairs, can be regarded as the enhanced sample set constructed on the original sample set, and each target image contained in the enhanced sample set carries two class labels, thus having greater classification difficulty.

404. The server acquires category labels to which the plurality of target image block pairs belong.

405. The server randomly selects a plurality of target categories from a plurality of category labels to be selected corresponding to the plurality of target image block pairs.

In some embodiments, assuming that the plurality of target image blocks correspond to M (M.gtoreq.2) class labels to be selected, the server randomly selects C (2.gtoreq.C.gtoreq.M) target classes from all the M class labels, and various random algorithms can be adopted during random selection, and the embodiment of the application does not specifically limit the random algorithm.

406. The server randomly selects a plurality of reference image blocks belonging to the plurality of target categories from the plurality of target image block pairs, different target categories in the plurality of target categories corresponding to the same number of reference image blocks.

In some embodiments, different target categories of the plurality of target categories correspond to the same number of reference image blocks, meaning that the number of reference image blocks screened by the server for each target category remains the same. For example, the server randomly screens out C target classes from M class labels to be selected, randomly selects K (K is greater than or equal to 1) reference image blocks for each class of the C classes, and finally can screen out c×k reference image blocks, where the c×k reference image blocks can form a Support Set for training

407. The server randomly selects a plurality of test image blocks belonging to the plurality of target categories, excluding the plurality of reference image blocks, from the plurality of target image block pairs.

In some embodiments, for each of the C target classes, the server only randomly selects K target image blocks from all target image blocks corresponding to each target class as K reference image blocks, so each target class will have remaining target image blocks, and the server may randomly select the plurality of test image blocks from all remaining target image blocks of the C target classes, where the plurality of test image blocks can form a test Set (Query Set) for testing.

It should be noted that, in screening the plurality of test image blocks, it is not necessary to keep at least one test image block screened for each of the C target categories, that is, it is allowed that no test image block in the plurality of test image blocks belongs to a certain target category of the C target categories, that is, it is only necessary to ensure that the category label corresponding to the test set belongs to the subset of the category label corresponding to the support set.

Obviously, there is no identical sample in the support set and the test set, i.e. there is no intersection between the support set and the test set. The model can give consideration to the feature distribution of the target image block in two different sample sets by comparing and learning the comprehensive support set and the test set, so that the classification accuracy of the model is improved.

In the above steps 405-407, the server screens out a plurality of test image blocks from the plurality of target image block pairs. Besides the screening mode, the server can also directly screen a plurality of target image blocks from the plurality of target image block pairs at random to serve as the plurality of test image blocks, calculate loss function values by using the prediction type and the real type of each test image block, and then iteratively adjust parameters of an initial classification model, wherein the process is equivalent to the process of carrying out contrast learning without respectively constructing a support set and a test set, and can greatly simplify model training.

408. The server invokes an initial classification model to extract image features of the plurality of reference image blocks and image features of the plurality of test image blocks, respectively.

In some embodiments, the server inputs the plurality of reference image blocks into the initial classification model, extracts image features of the plurality of reference image blocks through the initial classification model, for example, the initial classification model is a CNN model, and performs convolution processing on the plurality of reference image blocks through a plurality of convolution layers of the CNN, and determines a feature map output by a last convolution layer as the image features of the plurality of reference image blocks.

In some embodiments, the server inputs the plurality of test image blocks into the initial classification model, extracts image features of the plurality of test image blocks through the initial classification model, for example, the initial classification model is a CNN model, and performs convolution processing on the plurality of test image blocks through a plurality of convolution layers of the CNN, and determines a feature map output by a last convolution layer as the image features of the plurality of test image blocks.

It should be noted that the initial classification model may have various types of architectures (backbones), and the initial classification model includes, but is not limited to: CNN, resNet, WRN, etc., the embodiments of the present application do not specifically limit the architecture of the initial classification model.

409. The server obtains prediction categories to which the plurality of test image blocks belong based on correlation between image features of the plurality of reference image blocks and image features of the plurality of test image blocks.

In one example, consider the correlation as the inner product between two image features, assuming that for the ith test image block in test set qInvoking the initial classification model to extract +.>Is expressed as +.>Then the server gets +.>Image characteristics of each reference image block in the support set s>The inner product (j is larger than or equal to 1 and smaller than or equal to the sample capacity of the support set) between the two reference image blocks, and selecting the class label to which the reference image block with the largest inner product belongs as an initial classification model pair +.>Is of predictive category j of (1) ^* ，j ^* The expression of (2) is as follows:

in the above steps 408-409, the server invokes the initial classification model to obtain the prediction categories to which the plurality of test image blocks belong. In some embodiments, if the mode of contrast learning is not adopted, performing exponential normalization processing on the extracted image characteristics of each test image block, so as to obtain the prediction probability that each test image block belongs to each class label, and determining the class label with the maximum prediction probability as the prediction class of each test image block.

410. The server obtains a loss function value based on the prediction category to which the plurality of test image blocks belong and the category label to which the plurality of test image blocks belong.

In some embodiments, the initial classification model has a loss function L at iteration _iwh The following formula is shown, where the subscript iwh of L indicates a test image block with width w and height h that is cut out for the i-th target image:

wherein e is a natural base number,representing the identification of the resulting prediction category for the ith, wide w, high h test image block iwh in test set q, +.>Class label representing reference image block j of support set s having the same class label as the predicted class of test image block iwh, +.>Representing the extracted image features of the test image block iwh,>representing the extracted image features for reference image block j, a +.>Class label representing reference image block k of support set s having a different class label from the predicted class of test image block iwh, +.>Representing the image features extracted for the reference image block k.

wherein e is a natural base number,representing the prediction category obtained by identifying the test image i in the test set q,/i>Class label of reference image j representing the support set s with the same class label as the predicted class of test image i, +.>Representing the image features extracted for test image i, < >>Representing the extracted image features of reference image j, < >>Class label of reference image k representing a different class label from the predicted class of test image i in support set s, +.>Representing the image features extracted for the reference image k.

411. The server iterates steps 405-410 until the loss function value meets a stop condition, and stops iterating to obtain an image classification model, wherein the image classification model is used for identifying a class label to which the input image belongs.

Alternatively, the stopping condition is that the number of iterations is greater than a number threshold, or the stopping condition is that the loss function value is less than a loss threshold, and the loss threshold may be any value greater than or equal to the loss threshold.

In the above steps 405-411, the server adjusts the parameters of the initial classification model based on the plurality of target image block pairs and the class labels to which the plurality of target image block pairs belong, so as to obtain the image classification model. The image blocking technology-based contrast learning training and testing framework is constructed, so that the image blocking technology-based contrast learning training and testing framework can be smoothly applied to a task of small sample learning, and can bring about great performance improvement.

In the above steps 403-411, the server adjusts parameters of the initial classification model based on a plurality of the target image pairs to obtain an image classification model. By means of the contrast learning mode of the support set and the test set, small sample information can be constructed and implanted into a model training process, and an effective data enhancement algorithm is integrated to conduct data enhancement, so that accuracy of image classification in a small sample scene is improved.

In the data enhancement algorithm, as the target area in the sample image is interacted to construct a target image and the target image is subjected to image segmentation, a noise contrast training mechanism can be introduced into the segmented target image blocks, and the mechanism can be called an InfoPatch and can deeply mine the relation between each target image block, so that the estimation capability of the image classification model on noise information is improved.

In an exemplary scenario, taking an initial classification model with a Res-12 architecture as an example, the image classification model with a Res-12 architecture trained in the embodiments of the present application is called an infoPatch model, and relationships between image blocks can be mined to promote noise information estimation. And testing the InfoPatch model on the miniImageNet data set and the TieredImageNet data set respectively to obtain test results of the InfoPatch model on the two data sets respectively. Table 1 shows the comparison of the test results of the infopytch Model with ProtoNet (prototype network), matchinnet (matching network), relationship net (relational network), MAML (Model-Agnostic Meta Learnings, model independent meta learning), LEO (latent embedded optimization network), PPA (predictive activation parameter network), robustdist++ (classification distance adaptation network), wDAE (classification weight self-coding network), cc+rot (close-coupled self-supervision network), cross Domain migration network, tapNet (task self-adaptation prediction network), metaOptNet (meta operation network), CAN (Cross attention mechanism network), FEAT (multitasking network), deep emd (deep wastertein distance network), negtive Margin (classification distance adaptation network), rethin-distall (self-distillation network), respectively, under the same test conditions. Wherein each of the models generally belongs to a class 4 model architecture: conv4 (CNN), WRN-28 (WRN), res-10 (ResNet), res-12 (ResNet).

TABLE 1

As can be seen from table 1, the info patch model has very remarkable image classification performance on both data sets, and especially has a more remarkable and substantial improvement in the accuracy of image classification under a 1-shot (single sample) scene.

Fig. 5 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application, please refer to fig. 5, which includes:

an obtaining module 501, configured to obtain a plurality of sample image pairs, where any sample image pair in the plurality of sample image pairs includes a first sample image and a second sample image, and the first sample image and the second sample image belong to different class labels;

the exchanging module 502 is configured to exchange, for the arbitrary sample image pair, a first target area in the first sample image with a second target area in the second sample image, so as to obtain a target image pair, where the positions of the first target area and the second target area correspond to each other;

the adjustment module 503 is configured to adjust parameters of an initial classification model based on a plurality of the target image pairs, to obtain an image classification model, where the image classification model is used to identify a class label to which the input image belongs.

According to the device provided by the embodiment of the application, the plurality of sample image pairs are constructed, the two sample images in each sample image pair are guaranteed to belong to different category labels, the target areas at the same positions in each sample image pair are exchanged, a new target image pair after data enhancement can be constructed, the original sample image can be subjected to data enhancement in a small sample learning scene in a mode based on image information exchange, two category labels are carried in each generated target image, and therefore generalization capability of an image classification model based on small sample learning can be remarkably improved, and the classification accuracy of the image classification model is improved.

In a possible implementation, the target image pair includes a first target image and a second target image, and the exchanging module 502 is configured to:

cropping the first target region from the first sample image;

cropping the second target region from the second sample image;

In one possible implementation, based on the apparatus composition of fig. 5, the adjusting module 503 includes:

and the adjusting unit is used for adjusting the parameters of the initial classification model based on the plurality of target image block pairs and the category labels to which the plurality of target image block pairs belong to so as to obtain the image classification model.

In one possible implementation, the class label to which any one of the target image blocks in any one of the target image block pairs belongs is the same as the class label of the sample image to which the any one of the target image blocks originally belongs.

In a possible embodiment, based on the device composition of fig. 5, the adjustment unit comprises:

a screening subunit, configured to screen a plurality of test image blocks from the plurality of target image block pairs;

In one possible embodiment, the screening subunit is configured to:

In one possible implementation, the first acquisition subunit is configured to:

It should be noted that: the image processing apparatus provided in the above embodiment is only exemplified by the division of the above functional modules when processing an image, and in practical application, the above functional allocation can be performed by different functional modules according to needs, that is, the internal structure of the computer device is divided into different functional modules to perform all or part of the functions described above. In addition, the image processing apparatus and the image processing method provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the image processing apparatus and the image processing method are detailed in the image processing method embodiment, which is not described herein again.

Fig. 6 is a schematic structural diagram of a computer device according to an embodiment of the present application, where the computer device 600 may generate a relatively large difference due to different configurations or performances, and the computer device 600 includes one or more processors (Central Processing Units, CPU) 601 and one or more memories 602, where at least one computer program is stored in the memories 602, and the at least one computer program is loaded and executed by the one or more processors 601 to implement the image processing method according to the foregoing embodiments. Optionally, the computer device 600 further includes a wired or wireless network interface, a keyboard, an input/output interface, and other components for implementing the functions of the device, which are not described herein.

In an exemplary embodiment, a computer readable storage medium is also provided, for example a memory comprising at least one computer program executable by a processor in a terminal to perform the image processing method in the respective embodiments described above. For example, the computer readable storage medium includes ROM (Read-Only Memory), RAM (Random-Access Memory), CD-ROM (Compact Disc Read-Only Memory), magnetic tape, floppy disk, optical data storage device, and the like.

In an exemplary embodiment, a computer program product or computer program is also provided, comprising one or more program codes, the one or more program codes being stored in a computer readable storage medium. The one or more processors of the computer device are capable of reading the one or more program codes from the computer-readable storage medium, and executing the one or more program codes to enable the computer device to execute to perform the image processing method in the above embodiment.

Those of ordinary skill in the art will appreciate that all or a portion of the steps implementing the above-described embodiments can be implemented by hardware, or can be implemented by a program instructing the relevant hardware, optionally stored in a computer readable storage medium, optionally a read-only memory, a magnetic disk or an optical disk, etc.

The foregoing description of the preferred embodiments is merely exemplary in nature and is in no way intended to limit the invention, since it is intended that all modifications, equivalents, improvements, etc. that fall within the spirit and scope of the invention.

Claims

1. An image processing method, the method comprising:

image blocking is carried out on a plurality of target image pairs, so that a plurality of target image block pairs are obtained;

acquiring class labels to which the target image block pairs belong respectively;

and adjusting parameters of an initial classification model based on the plurality of target image block pairs and class labels to which the plurality of target image block pairs belong to obtain an image classification model, wherein the image classification model is used for identifying the class labels to which the input image belongs.

2. The method of claim 1, wherein the target image pair comprises a first target image and a second target image, and wherein the exchanging the first target region in the first sample image with the second target region in the second sample image to obtain a target image pair comprises:

cropping the first target region from the first sample image;

cropping the second target region from the second sample image;

3. The method of claim 1, wherein a class label to which any one of the pair of target tiles belongs is the same as a class label of a sample image in which the any one of the pair of target tiles is originally located.

4. The method of claim 1, wherein adjusting parameters of an initial classification model based on the plurality of target image block pairs and class labels to which the plurality of target image block pairs each belong to, to obtain an image classification model comprises:

Screening a plurality of test image blocks from the plurality of target image block pairs;

invoking the initial classification model to obtain the prediction category to which the plurality of test image blocks belong;

acquiring a loss function value based on the prediction category to which the plurality of test image blocks belong and the category labels to which the plurality of test image blocks belong;

and iteratively executing the step of acquiring the loss function value until the loss function value meets a stop condition, and stopping iteration to obtain the image classification model.

5. The method of claim 4, wherein screening a plurality of test image blocks from the plurality of target image block pairs comprises:

6. The method of claim 5, wherein the invoking the initial classification model to obtain the prediction category to which the plurality of test image blocks belong comprises:

7. An image processing apparatus, characterized in that the apparatus comprises:

an adjustment module comprising:

The adjusting unit is used for adjusting parameters of the initial classification model based on the plurality of target image block pairs and class labels to which the plurality of target image block pairs belong to, so as to obtain an image classification model, and the image classification model is used for identifying the class labels to which the input image belongs.

8. The apparatus of claim 7, wherein the pair of target images comprises a first target image and a second target image, the switching module to:

cropping the first target region from the first sample image;

cropping the second target region from the second sample image;

9. The apparatus of claim 7, wherein a class label to which any one of the pair of target tiles belongs is the same as a class label of a sample image in which the any one of the pair of target tiles is originally located.

10. The apparatus according to claim 7, wherein the adjusting unit comprises:

11. The apparatus of claim 10, wherein the screening subunit is configured to:

12. The apparatus of claim 11, wherein the first acquisition subunit is configured to:

13. A computer device comprising one or more processors and one or more memories, the one or more memories having stored therein at least one computer program loaded and executed by the one or more processors to implement the image processing method of any of claims 1 to 6.

14. A storage medium having stored therein at least one computer program loaded and executed by a processor to implement the image processing method of any one of claims 1 to 6.