CN112329785A

CN112329785A - Image management method, device, terminal and storage medium

Info

Publication number: CN112329785A
Application number: CN202011334869.1A
Authority: CN
Inventors: 薛致远; 李亚乾; 郭彦东; 杨林
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2020-11-25
Filing date: 2020-11-25
Publication date: 2021-02-05

Abstract

The application relates to an image management method, an image management device, a terminal and a storage medium, and belongs to the technical field of terminals. The method comprises the following steps: performing model training on the first feature extraction model based on a plurality of label-free first sample images to obtain a second feature extraction model; acquiring a plurality of second sample images based on a target image management function, wherein the plurality of second sample images are sample images labeled according to the target image management function; and carrying out model fine adjustment on the second feature extraction model based on the plurality of second sample images to obtain an image management model, wherein the image management model is used for carrying out image management on the images. By the method provided by the scheme, the training of the image management model can be realized only by a small amount of labeled sample data, and the number of labeled sample images required by the training of the image management model is reduced, so that the time cost of labeling is reduced, and the efficiency of model training is improved.

Description

Image management method, device, terminal and storage medium

Technical Field

The embodiment of the application relates to the technical field of terminals, in particular to an image management method, an image management device, a terminal and a storage medium.

Background

With the development of terminal technology, more and more users will use terminals such as mobile phones to record their lives. As the number of images stored in the album of the terminal increases, it is difficult for the user to find a desired image when browsing images because the number of images is too large. Therefore, management of images in the terminal is required.

At present, images are generally managed through an image management model; therefore, before managing the images, the image management model needs to be trained. The process of training the image management model comprises the following steps: labeling a large number of sample images, and performing model training based on the labeled sample images.

Disclosure of Invention

The embodiment of the application provides an image management method, an image management device, a terminal and a storage medium, which can improve the training efficiency of an image management model. The technical scheme is as follows:

in one aspect, an image management method is provided, the method including:

performing model training on the first feature extraction model based on a plurality of label-free first sample images to obtain a second feature extraction model;

acquiring a plurality of second sample images based on a target image management function, wherein the plurality of second sample images are sample images labeled according to the target image management function;

and carrying out model fine adjustment on the second feature extraction model based on the plurality of second sample images to obtain an image management model, wherein the image management model is used for carrying out image management on the images.

In another aspect, there is provided an image management apparatus, the apparatus including:

the model training module is used for carrying out model training on the first feature extraction model based on a plurality of label-free first sample images to obtain a second feature extraction model;

the acquisition module is used for acquiring a plurality of second sample images based on a target image management function, wherein the second sample images are sample images labeled according to the target image management function;

and the model fine-tuning module is used for carrying out model fine-tuning on the second feature extraction model based on the plurality of second sample images to obtain an image management model, and the image management model is used for carrying out image management on the images.

In another aspect, a terminal is provided that includes a processor and a memory; the memory stores at least one program code for execution by the processor to implement the image management method as described in the above aspect.

In another aspect, a computer readable storage medium is provided, the storage medium storing at least one program code for execution by the processor to implement the image management method as described in the above aspect.

In another aspect, a computer program product is provided, which stores at least one program code, which is loaded and executed by a processor to implement the image management method of the above aspect.

In the embodiment of the application, when the image management model is trained, a general feature extraction model is trained through a non-labeled sample image, then the general feature extraction model is finely adjusted through a labeled sample image, so that the image management model can be obtained, and thus, the training of the image management model can be realized only by a small amount of labeled sample data, the number of labeled sample images required by the training of the image management model is reduced, the time cost of labeling is reduced, and the efficiency of model training is improved.

Drawings

Fig. 1 illustrates a schematic structural diagram of a terminal provided in an exemplary embodiment of the present application;

FIG. 2 illustrates a flow chart of an image management method shown in an exemplary embodiment of the present application;

FIG. 3 illustrates a flow chart of an image management method shown in an exemplary embodiment of the present application;

FIG. 4 illustrates a flow chart of an image management method shown in an exemplary embodiment of the present application;

FIG. 5 illustrates a schematic diagram of image management shown in an exemplary embodiment of the present application;

FIG. 6 illustrates a flow chart of an image management method shown in an exemplary embodiment of the present application;

fig. 7 shows a block diagram of an image management apparatus according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Reference herein to "a plurality" means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., A and/or B, meaning: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

Referring to fig. 1, a block diagram of a terminal 100 according to an exemplary embodiment of the present application is shown. In some embodiments, the terminal 100 is a smartphone, tablet, wearable device, camera, or the like having image processing functionality. The terminal 100 in the present application includes at least one or more of the following components: a processor 110, a memory 120.

In some embodiments, processor 110 includes one or more processing cores. The processor 110 connects various parts within the entire terminal 100 using various interfaces and lines, performs various functions of the terminal 100 and processes data by running or executing program codes stored in the memory 120 and calling data stored in the memory 120. In some embodiments, the processor 110 is implemented in hardware using at least one of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 110 may integrate one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a Neural-Network Processing Unit (NPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content required to be displayed by the display screen; the NPU is used for realizing an Artificial Intelligence (AI) function; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 110, but may be implemented by a single chip.

In some embodiments, the processor 110 is configured to invoke different image management models based on different image management tasks; the image management is performed on the image in the terminal 100 based on the image management model.

In some embodiments, Memory 120 comprises Random Access Memory (RAM), and in some embodiments, Memory 120 comprises Read-Only Memory (ROM). In some embodiments, the memory 120 includes a non-transitory computer-readable medium. The memory 120 may be used to store program code. The memory 120 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing various method embodiments described below, and the like; the storage data area may store data (such as audio data, a phonebook) created according to the use of the terminal 100, and the like.

In some embodiments, the memory 120 stores model parameters of a generic feature extraction model and model parameters of an image management model corresponding to other image management tasks generated based on the generic feature extraction model.

In some embodiments, terminal 100 further includes an image collector for collecting images. In some embodiments, the image collector is an image collector 130 integrated on the terminal 100. For example, the image collector is a camera or the like mounted on the terminal 100. In some embodiments, the image collector is an image collecting device connected to the terminal 100. For example, the image collector is a camera or the like connected to the terminal 100.

In some embodiments, a display screen is also included in terminal 100. A display screen is a display component for displaying a user interface. In some embodiments, the display screen is a display screen with a touch function, and a user can perform a touch operation on the display screen by using any suitable object such as a finger, a touch pen, and the like. In some embodiments, the display is typically provided on the front panel of the terminal 100. In some embodiments, the display screen is designed as a full-face screen, curved screen, contoured screen, double-face screen, or folded screen. In some embodiments, the display screen is further designed to be a combination of a full-face screen and a curved-face screen, a combination of a special-shaped screen and a curved-face screen, and the like, which is not limited by the embodiment.

In addition, those skilled in the art will appreciate that the configuration of terminal 100 illustrated in the above-described figures is not intended to be limiting of terminal 100, as terminal 100 may include more or less components than those illustrated, or some components may be combined, or a different arrangement of components. For example, the terminal 100 further includes a microphone, a speaker, a radio frequency circuit, an input unit, a sensor, an audio circuit, a Wireless Fidelity (Wi-Fi) module, a power supply, a bluetooth module, and other components, which are not described herein again.

With the development of terminal technology, the functions of the terminal become more and more powerful. For example, the terminal can store the acquired image in an album application. As the time for a user to use a terminal increases, the number of images stored in the album application increases, which makes it difficult for the user to find a target image in the album application. Therefore, a way of album-dividing images stored in an album application of the terminal has appeared. For example, the images are divided into albums according to the generation time of the images. Correspondingly, the terminal divides the images corresponding to the same time period into the same album. Or, according to the source of the image, the image is divided into albums. Correspondingly, the terminal divides the images acquired in the same application into the same album.

Along with the development of the terminal, the dividing mode of the photo album is more and more abundant. The terminal can also divide the images into albums according to the content of the images. For example, images containing the same face are divided into the same album, and so on. This requires the terminal to have a requirement for classifying the images. Generally, a terminal classifies images in an album application by an image classification model. Therefore, before classifying an image, an image classification model needs to be trained. The process of training the image classification model comprises the following steps: and carrying out class label labeling on a large number of sample images, and carrying out model training based on the sample images labeled with labels. This results in a large number of images needing to be labeled manually, which results in too high requirements on samples in the training process of training the classification model for managing the album and low training efficiency.

When the image management model is trained, a general feature extraction model is trained through an unlabeled sample image, then the general feature extraction model is finely adjusted through a labeled sample image, and the image management model can be obtained.

Referring to fig. 2, a flowchart illustrating an image management method according to an exemplary embodiment of the present application is shown. The execution subject in the embodiment of the present application is the terminal 100, or the processor 110 in the terminal 100 or the operating system in the terminal 100, and the execution subject is taken as the terminal 100 in the embodiment for example. In the examples of the present application, the description is given by way of example. The method comprises the following steps:

step 201: and the terminal performs model training on the first feature extraction model based on the plurality of unlabeled first sample images to obtain a second feature extraction model.

The first sample images are images used for model training, and the plurality of first sample images do not need to be labeled. The first feature extraction model is a network model without model training. For example, the first network extraction model is a Convolutional Neural Network (CNN). The type of the CNN network is any network type for performing image processing, and can be set as needed. For example, a Visual Geometry Group (VGG) network, a residual error network (Resnet), google net (a deep network structure), mobilet (a deep network structure), and the like.

In this step, the terminal performs unsupervised training on the first feature extraction model through the plurality of unlabeled first sample images to obtain a second feature extraction model. Referring to fig. 3, the process is implemented by steps (1) - (3), including:

(1) the terminal acquires a positive sample pair and a negative sample pair of each first sample image.

A positive sample pair refers to a sample image that can be classified into the same class by the model. In some implementations, the terminal receives a plurality of first sample images grouped in advance, the first sample images in each group being positive sample pairs with each other. In some embodiments, the terminal performs data enhancement on the first sample images, determines each first sample image and the sample image obtained by data enhancement of the first sample image as sample images which are mutually positive sample pairs, and determines sample images other than the positive sample pairs of the first sample images as negative sample pairs of the first sample images. The process is realized by the following steps (1-1) to (1-3), and comprises the following steps:

and (1-1) for each first sample image, the terminal performs sample data enhancement on the first sample image to obtain a plurality of third sample images corresponding to the first sample image.

Data enhancement refers to changing the image characteristics of an image by changing the display form of the image. The data enhancement comprises a geometric transformation mode, a color transformation mode and the like. The set transformation mode comprises at least one of image turning, image rotation, image cropping, image deformation, image scaling and the like. The color transformation includes at least one of noise processing, blurring processing, color transformation, erasure processing, and padding processing.

In this step, referring to fig. 4, the terminal performs data enhancement on each first sample image through multiple data enhancement methods, respectively, to obtain multiple third images corresponding to each first sample image.

It should be noted that the data enhancement used may be the same or different for each first sample image. In addition, the number of the plurality of third sample images corresponding to each first sample image is the same or different, which is not specifically limited in the embodiments of the present application.

(1-2) the terminal composes the plurality of third sample images into positive sample pairs of the first sample image.

In this step, the terminal makes up an image set with third sample images obtained by data enhancement of the same first sample image, and determines, for each first sample image, a sample image in the image set in which the first sample image is located as a positive sample pair of the first sample image.

In the implementation manner, the first sample image is expanded in a data enhancement manner, and the third sample images obtained through expansion are respectively determined as the positive sample pairs of the first sample image, so that the number of the expanded sample images increases the data amount of model training.

(1-3) the terminal determines a positive sample pair of the plurality of first sample images and a plurality of third sample images corresponding to the plurality of first sample images except the positive sample pair of the first sample images as a negative sample pair of the first sample images.

A negative sample pair refers to a sample image that cannot be classified by the model into the target class, or an image that does not belong to the current classification class. In some embodiments, the terminal determines the images outside the sample set as negative sample pairs of any sample image in the sample set by regarding the sample images in the same sample set as the positive sample pairs.

(2) The terminal determines a loss value of the first feature extraction model based on the feature similarity between the pair of positive samples and the pair of negative samples.

In this step, the terminal extracts the sample features of the positive sample pair and the negative sample pair respectively, and determines the feature similarity between the positive sample pair and the negative sample pair based on the sample features, so as to determine the loss value of the first feature extraction model through a non-parameterized classification model. The process is realized by the following steps (2-1) - (2-3), and comprises the following steps:

and (2-1) respectively inputting the positive sample pairs and the negative sample pairs into the first feature extraction model by the terminal to obtain a plurality of first sample features of the positive sample pairs and a plurality of second sample features of the negative sample pairs.

With continued reference to fig. 4, the terminal inputs the sample images corresponding to the positive sample pair and the negative sample pair into the first feature extraction model, respectively, to obtain a plurality of first sample features of the positive sample pair and second sample features of the negative sample pair.

It should be noted that, for the negative sample pairs determined from the sample feature library, the terminal can also directly sample the sample features in the sample feature library as a plurality of second sample features of the negative sample pairs. In the embodiments of the present application, this is not particularly limited.

In some embodiments, the terminal takes as the sample features of the plurality of first sample images and the plurality of third sample images corresponding to the plurality of positive sample pairs and the plurality of negative sample pairs. In another possible implementation manner, the terminal samples the sample features of the plurality of first sample images and the plurality of third sample images corresponding to the plurality of positive sample pairs and the plurality of negative sample pairs, and takes the sampled samples as the sample features. The process is as follows:

a1, the terminal performs feature extraction on the plurality of first sample images and the plurality of third sample images to obtain a plurality of third sample features.

Referring to fig. 4, in this step, the terminal inputs the first sample image and the third sample image into the first feature extraction model to obtain a plurality of third sample features.

a2, sampling the sample characteristics in the third sample characteristics by the terminal to obtain the first sample characteristics and the second sample characteristics.

In the process of model training, the terminal needs to perform a plurality of iterative training processes on the first feature extraction model, and each iterative training process comprises a plurality of parameter adjustment processes. And when the model updates the parameters once every time one training sample set is finished, and the parameter updating times reach a preset threshold value, finishing an iterative training process. The number of sample images in the training sample group and the preset threshold are set and changed as needed, which is not specifically limited in the embodiment of the present application. Correspondingly, the terminal determines the sampling range for sampling the sample characteristics according to the iterative process of the first characteristic extraction model.

Wherein, the sampling range is set according to the requirement. For example, in some embodiments, the terminal determines sample features of a first training sample set from the plurality of third sample features, the first training sample set including a plurality of first sample images and a plurality of third sample images used by a model parameter update process; and sampling the sample features in the sample features of the first training sample group to obtain the plurality of first sample features and the plurality of second sample features.

For example, if the total number of the first sample images and the third sample images is 2000 and the number of training samples for each parameter update is 20, the iterative training process of the model is implemented once every 100 parameter updates. In this implementation manner, each time the terminal completes feature extraction of 20 first sample images and third sample images, the terminal updates the model parameters of the first feature extraction model once, stores the model parameters and the features of the 20 first sample images and third sample images into the sample feature library, and samples the sample features in the feature library to obtain the plurality of first sample features and the plurality of second sample features.

In the implementation mode, the terminal updates the sample characteristics in the sample characteristic library corresponding to the sampling range according to the result of each parameter update, so that the high efficiency of sample characteristic update is ensured.

In some embodiments, the terminal determines sample features of a second training sample set from the plurality of third sample features, the second training sample set comprising a plurality of first sample images and a plurality of third sample images used in one iterative training process; and sampling the sample features in the sample features of the second training sample group to obtain the plurality of first sample features and the plurality of second sample features.

For example, if the total number of the first sample images and the third sample images is 2000 and the number of training samples for each parameter update is 20, the iterative training process of the model is implemented once every 100 parameter updates. In this implementation manner, the terminal stores the image features of 2000 first sample images and third sample images subjected to iterative training and the model parameters of the first feature extraction model when the iterative training is completed into the sample feature library, and samples the sample features in the feature library to obtain the plurality of first sample features and the plurality of second sample features.

In the implementation manner, the terminal stores the image characteristics of all the first sample images and the third sample images in the sample characteristic library corresponding to the sampling range in the one-time iteration process, so that the sample characteristic quantity in the sample characteristic library is large enough and the samples are abundant.

In some embodiments, the terminal determines sample features of a third training sample set from the plurality of third sample features, where the third training sample set includes a plurality of first sample images and a plurality of third sample images used in a plurality of model parameter updating processes that are closest to the current parameter updating process in the iterative training process; and sampling the sample features in the sample features of the third training sample group to obtain the plurality of first sample features and the plurality of second sample features.

For example, if the total number of the first sample images and the third sample images is 2000 and the number of training samples for each parameter update is 20, the iterative training process of the model is implemented once every 100 parameter updates. In this implementation manner, the sample feature library stores the sample features of the previous 3 wire harness update processes, and each time the terminal completes feature extraction of 20 first sample images and third sample images, the terminal stores the feature extraction models of the 20 first sample images and third sample images into the sample feature library, deletes the sample feature generated in the adoption number update process farthest from the current time, and samples the sample features in the feature library to obtain the plurality of first sample features and the plurality of second sample features.

It should be noted that the number of the model parameter updating processes is set as required, and this is not particularly limited in the embodiment of the present application. For example, the process of the multiple film layer parameter updates is 3, 5, 10, etc.

In the implementation mode, the number of the sample features in the sample feature library is ensured by dynamically updating the sample features in the sample feature library, the instantaneity of the sample features is also ensured, and the accuracy of model training is improved.

It should be noted that the sample images in the sample feature library can be stored in an overlay storage manner, and in the embodiment of the present application, the storage manner of the samples in the sample feature library is not particularly limited.

In addition, the terminal samples from the sample feature library to obtain a plurality of first sample features and a plurality of second sample features. In some embodiments, the terminal randomly samples from the sample feature library, and determines the sampled sample features as a plurality of first sample features and a plurality of second sample features.

In the implementation mode, the terminal extracts the sample features from the feature library, so that the source of the first sample image is enriched, and the accuracy of the sample features is improved.

The other point to be described is that the terminal can also determine the first training sample set, the second training sample set, or the third training sample set, and then perform feature extraction on the first training sample set, the second training sample set, or the third training sample set, so as to perform sampling. In the embodiments of the present application, this is not particularly limited.

(2-2) the terminal inputting the plurality of first sample features and the plurality of second sample features into a classification network, and determining feature similarity between the first sample features and the second samples through the classification network.

With continued reference to fig. 4, the terminal inputs the plurality of first sample features and the plurality of second sample features into a classification network, the classification network is configured to classify different features, and based on the classification results, feature similarities between different feature classification results are determined. The more accurate the classification result of the classification network is, the smaller the similarity between the first sample characteristic and the second sample characteristic is; the classification result of the classification network has a larger deviation, which indicates that the similarity between the first sample characteristic and the second sample characteristic is larger.

And (2-3) the terminal determines a loss value of the first feature extraction model based on the feature similarity.

The smaller the feature similarity is, the lower the feature accuracy rate extracted by the first feature extraction model is, and the terminal feeds back reverse optimization parameter adjustment parameters; the larger the feature similarity is, the higher the feature accuracy rate extracted by the first feature extraction model is, and the forward optimization adjustment parameters are fed back by the terminal. Therefore, the similarity between the measured sample characteristic and the class center characteristic is replaced by the similarity between the measured characteristic and the characteristic, the trouble of class center weight is eliminated, and the problem of large-scale classification parameter explosion is solved. In the optimization process, the loss function used is an unparameterized loss function, such as an unparameterized softmax loss function, a triplet function, or an am-softmax function.

Responding to the terminal adopting a softmax loss function, wherein the loss value is realized by the following formula:

wherein L is_qFor the loss value, q is the sample image query, k of the query₊For the first sample characteristic, τ is the temperature coefficient, K is the number of samples, K_iIs any sample image from the query sample.

(3) And the terminal performs model training on the first feature extraction model based on the loss value to obtain the second feature extraction model.

In the step, the terminal optimizes the first feature extraction model based on the loss function to obtain a third feature extraction model, model training is carried out on the third feature extraction model through a plurality of first sample images until the loss value of the third feature extraction model is converged, and the second feature extraction model is determined to be finished.

In the implementation mode, the second feature extraction model is trained in an unsupervised training mode, so that the load increase caused by the manual labeling of the labels is avoided, and further the resource of model training is saved.

It should be noted that the process of acquiring the second feature extraction model through unsupervised training can also be executed by other electronic devices, and accordingly, the terminal sends a model acquisition request to the other electronic devices, the other electronic devices acquire the second feature extraction model according to the acquisition request, send the second feature extraction model to the terminal, and the terminal receives the second feature extraction model sent by the other electronic devices. The process of training the second feature extraction model by other electronic devices is similar to the process of training the second feature extraction model by the terminal, and is not repeated here.

Step 202: the terminal acquires a plurality of second sample images based on the target image management function.

The plurality of second sample images are sample images labeled according to the target image management function. The target image management model is set as needed. For example, the target image management function includes at least one of image classification, target object detection, image segmentation, and image tag generation. Correspondingly, the terminal acquires different second sample images according to different functions.

In response to the target image management function being image classification, the plurality of second sample images are sample images labeled image classes. Correspondingly, the terminal acquires a plurality of sample images, determines the image type of each sample image, and labels the sample images based on the image type to obtain a plurality of second sample images.

In response to the target management function detecting the target object, the plurality of second sample images are sample images labeled with the target object. Correspondingly, the terminal acquires a plurality of sample images, determines whether a target object exists in each sample image, labels the target object if the target object exists, and does not label the target object if the target object does not exist, so as to obtain a plurality of second sample images.

In response to the target image management function being image segmentation, the plurality of second sample images are sample images to which image segmentation results are labeled. Correspondingly, the terminal acquires a plurality of sample images, performs image segmentation on each sample image based on the image segmentation requirement to obtain an image segmentation result, and labels the sample images based on the image segmentation result to obtain a plurality of second sample images.

In response to the target image management function being to generate an image tag, the plurality of second sample images are sample images labeled with the image tag. Correspondingly, the terminal acquires a plurality of sample images, determines an image label of each sample image, and labels the sample images based on the image labels to obtain a plurality of second sample images.

It should be noted that the image labeling process is as follows: the terminal receives an annotation result input by a user and generates a plurality of second sample images based on the annotation result. Or the terminal carries out image processing on the sample image through other image processing models and determines the labeling result based on the image processing result. In the embodiments of the present application, this is not particularly limited.

Step 203: and the terminal carries out model fine adjustment on the second feature extraction model based on the plurality of second sample images to obtain an image management model, and the image management model is used for carrying out image management on the images.

In this step, different image management models are trained according to different target functions. Referring to fig. 5, after model training is performed on the first feature extraction model through a plurality of unlabeled first sample data, the second feature region model is obtained, and then the second feature extraction model is applied to a downstream task. In the process, model fine adjustment is carried out on the second feature extraction model through the plurality of second sample images, so that the second feature extraction model can execute corresponding target image management tasks. Correspondingly, the terminal can perform model training on the second feature extraction model according to different target image management tasks.

In some embodiments, the terminal performs model fine-tuning on the second feature extraction model through the second sample image labeled with the image category to obtain an image classification model. In some embodiments, the terminal performs model fine-tuning on the second feature extraction model through the second sample image labeled with the target object, so as to obtain a target object detection model. In some embodiments, the terminal performs model fine-tuning on the second feature extraction model through the second sample image labeled with the image segmentation result to obtain the image segmentation model. In some embodiments, the terminal performs model fine-tuning on the second feature extraction model by labeling the image label on the second sample image, so as to obtain a label determination model.

The terminal can respectively perform model fine adjustment on the second feature extraction model through a plurality of second sample images to obtain a plurality of different types of image management models, so that the terminal can execute different image management tasks.

In addition, the terminal stores a plurality of image management models corresponding to different target image management functions, and respectively stores the model parameters of the second feature extraction model and the model parameters obtained by the adjusted image management models corresponding to the plurality of different functions. Correspondingly, in the using process, the model parameters of the second feature extraction model and the model parameters corresponding to the corresponding management functions are modified and called. Therefore, when various image management models are stored in the terminal, the storage space of the terminal is saved, and the storage pressure of the terminal is reduced.

For example, in a task of managing an album application of the terminal, the model can be simultaneously extracted according to the second feature, and model fine adjustment is performed to obtain image management models corresponding to different album management functions, so that a user can select different album division modes to divide the albums in the albums according to requirements. Correspondingly, referring to fig. 6, in the process of using the photo album, a target image to be processed is input, model parameters of a second feature extraction model are called, the second feature extraction model is initialized, model parameters corresponding to different photo album management tasks are called based on different photo album management tasks, and the target image is processed to obtain a photo album management result based on the model parameters of the second feature extraction model and the model parameters corresponding to the photo album management tasks.

Referring to fig. 7, a block diagram of an image management apparatus according to an embodiment of the present application is shown. The image management device can be implemented as all or part of the processor 110 by software, hardware, or a combination of both. The device includes:

the model training module 701 is used for performing model training on the first feature extraction model based on a plurality of label-free first sample images to obtain a second feature extraction model;

an obtaining module 702, configured to obtain, based on a target image management function, a plurality of second sample images, where the plurality of second sample images are sample images labeled according to the target image management function;

a model fine-tuning module 703, configured to perform model fine-tuning on the second feature extraction model based on the plurality of second sample images to obtain an image management model, where the image management model is used to perform image management on an image.

In some embodiments, the model training module 701 comprises:

an acquisition unit configured to acquire a positive sample pair and a negative sample pair of each first sample image;

a determining unit, configured to determine a loss value of the first feature extraction model based on feature similarity between the positive sample pair and the negative sample pair;

and the model training unit is used for carrying out model training on the first feature extraction model based on the loss value to obtain the second feature extraction model.

In some embodiments, the determining unit is configured to input the positive sample pair and the negative sample pair into the first feature extraction model, respectively, to obtain a plurality of first sample features of the positive sample pair and a plurality of second sample features of the negative sample pair; inputting the plurality of first sample features and the plurality of second sample features into a classification network, and determining feature similarity between the first sample features and the second samples through the classification network; based on the feature similarity, a loss value of the first feature extraction model is determined.

In some embodiments, the determining unit is configured to perform feature extraction on the plurality of first sample images and the plurality of third sample images to obtain a plurality of third sample features; and sampling the sample features in the third sample features to obtain the first sample features and the second sample features.

In some embodiments, the determining unit is configured to determine, from the plurality of third sample features, sample features of a first training sample set, where the first training sample set includes a plurality of first sample images and a plurality of third sample images used in a model parameter updating process; sampling sample features in the sample features of the first training sample group to obtain a plurality of first sample features and a plurality of second sample features;

the determining unit is used for determining sample characteristics of a second training sample set from the plurality of third sample characteristics, wherein the second training sample set comprises a plurality of first sample images and a plurality of third sample images used in one iterative training process; sampling sample features in the sample features of the second training sample group to obtain a plurality of first sample features and a plurality of second sample features; (ii) a

The determining unit is configured to determine sample features of a third training sample set from the plurality of third sample features, where the third training sample set includes a plurality of first sample images and a plurality of third sample images used in a plurality of model parameter updating processes that are closest to a current parameter updating process in the iterative training process; and sampling the sample features in the sample features of the third training sample group to obtain the plurality of first sample features and the plurality of second sample features.

In some embodiments, the obtaining unit is configured to, for each first sample image, perform sample data enhancement on the first sample image to obtain a plurality of third sample images corresponding to the first sample image; forming the plurality of third sample images into positive sample pairs of the first sample image; and determining a positive sample pair of the plurality of first sample images and a plurality of third sample images corresponding to the plurality of first sample images except the first sample image as a negative sample pair of the first sample image.

In some embodiments, the target image management function includes at least one of image classification, target object detection, image segmentation, and generation of image tags;

the model fine-tuning module 703 is configured to perform model fine-tuning on the second feature extraction model through the second sample image labeled with the image category to obtain an image classification model;

the model fine-tuning module 703 is configured to perform model fine-tuning on the second feature extraction model through the second sample image labeled with the target object, so as to obtain a target object detection model;

the model fine-tuning module 703 is configured to perform model fine-tuning on the second feature extraction model through the second sample image labeled with the image segmentation result, so as to obtain an image segmentation model;

the model fine-tuning module 703 is configured to perform model fine-tuning on the second feature extraction model by labeling the image label on the second sample image, so as to obtain a label determination model.

The embodiment of the present application also provides a computer readable medium, which stores at least one program code, and the at least one program code is loaded and executed by the processor to implement the image management method as shown in the above embodiments.

The embodiment of the present application further provides a computer program product, where at least one program code is stored, and the at least one program code is loaded and executed by the processor to implement the image management method as shown in the above embodiments.

Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in the embodiments of the present application can be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions can be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media is any available media that can be accessed by a general purpose or special purpose computer.

The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. An image management method, characterized in that the method comprises:

2. The method of claim 1, wherein model training the first feature extraction model based on the plurality of unlabeled first sample images to obtain a second feature extraction model comprises:

acquiring a positive sample pair and a negative sample pair of each first sample image;

determining a loss value of a first feature extraction model based on feature similarity between the positive sample pair and the negative sample pair;

and performing model training on the first feature extraction model based on the loss value to obtain the second feature extraction model.

3. The method of claim 2, wherein determining a loss value for a first feature extraction model based on feature similarities between the pair of positive samples and the pair of negative samples comprises:

inputting the positive sample pairs and the negative sample pairs into the first feature extraction model respectively to obtain a plurality of first sample features of the positive sample pairs and a plurality of second sample features of the negative sample pairs;

inputting the plurality of first sample features and the plurality of second sample features into a classification network, and determining feature similarity between the first sample features and the second samples through the classification network;

based on the feature similarity, a loss value of the first feature extraction model is determined.

4. The method of claim 3, wherein the inputting the positive and negative sample pairs into the first feature extraction model, respectively, resulting in a plurality of first sample features of the positive sample pairs and a plurality of second sample features of the negative sample pairs, comprises:

performing feature extraction on the plurality of first sample images and the plurality of third sample images to obtain a plurality of third sample features;

and sampling the sample features in the third sample features to obtain the first sample features and the second sample features.

5. The method of claim 4, wherein the sampling of the sample features in the third plurality of sample features to obtain the first plurality of sample features and the second plurality of sample features comprises at least one of:

determining sample features of a first training sample set from the plurality of third sample features, the first training sample set comprising a plurality of first sample images and a plurality of third sample images used by a model parameter update process; sampling sample features in the sample features of the first training sample group to obtain a plurality of first sample features and a plurality of second sample features;

determining sample features of a second training sample set from the plurality of third sample features, the second training sample set comprising a plurality of first sample images and a plurality of third sample images used in an iterative training process; sampling sample features in the sample features of the second training sample group to obtain the plurality of first sample features and the plurality of second sample features;

determining sample characteristics of a third training sample set from the plurality of third sample characteristics, wherein the third training sample set comprises a plurality of first sample images and a plurality of third sample images which are used in a plurality of model parameter updating processes which are closest to the current parameter updating process in the iterative training process; and sampling sample features in the sample features of the third training sample group to obtain the plurality of first sample features and the plurality of second sample features.

6. The method of claim 2, wherein said obtaining a positive sample pair and a negative sample pair for each first sample image comprises:

for each first sample image, performing sample data enhancement on the first sample image to obtain a plurality of third sample images corresponding to the first sample image;

forming the plurality of third sample images into positive sample pairs of the first sample image;

and determining a positive sample pair except the first sample image as a negative sample pair of the first sample image in a plurality of third sample images corresponding to the plurality of first sample images and the plurality of first sample images.

7. The method of claim 1, wherein the target image management function comprises at least one of image classification, target object detection, image segmentation, and image tag generation;

performing model fine adjustment on the second feature extraction model based on the plurality of second sample images to obtain an image management model, wherein the method comprises at least one implementation mode:

performing model fine adjustment on the second feature extraction model through the second sample image marked with the image category to obtain an image classification model;

performing model fine adjustment on the second feature extraction model through the second sample image marked with the target object to obtain a target object detection model;

performing model fine adjustment on the second feature extraction model through the second sample image marked with the image segmentation result to obtain an image segmentation model;

and carrying out model fine adjustment on the second feature extraction model by marking an image label on the second sample image to obtain a label determination model.

8. An image management apparatus, characterized in that the apparatus comprises:

9. A terminal, wherein the terminal processor comprises a processor and a memory; the memory stores at least one program code for execution by the processor to implement the image management method of any of claims 1 to 7.

10. A computer-readable storage medium, characterized in that the storage medium stores at least one program code for execution by a processor to implement the image management method according to any one of claims 1 to 7.