WO2022028147A1 - 图像分类模型训练方法、装置、计算机设备及存储介质 - Google Patents

图像分类模型训练方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2022028147A1
WO2022028147A1 PCT/CN2021/102530 CN2021102530W WO2022028147A1 WO 2022028147 A1 WO2022028147 A1 WO 2022028147A1 CN 2021102530 W CN2021102530 W CN 2021102530W WO 2022028147 A1 WO2022028147 A1 WO 2022028147A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
images
classification
reference data
classification result
Prior art date
Application number
PCT/CN2021/102530
Other languages
English (en)
French (fr)
Inventor
卢东焕
赵俊杰
马锴
郑冶枫
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP21853462.6A priority Critical patent/EP4113376A4/en
Publication of WO2022028147A1 publication Critical patent/WO2022028147A1/zh
Priority to US17/964,739 priority patent/US20230035366A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/32Normalisation of the pattern dimensions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • G06V10/763Non-hierarchical techniques, e.g. based on statistics of modelling distributions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20076Probabilistic image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30101Blood vessel; Artery; Vein; Vascular

Definitions

  • the present application relates to the technical field of image processing, and in particular, to an image classification model training method, apparatus, computer equipment and storage medium.
  • Image classification technology based on artificial intelligence can divide an image set into different classes or clusters according to a certain standard, for example, the similarity between images, so that the similarity of the images in the same cluster is as large as possible, At the same time, the differences of images that are not in the same cluster are also as large as possible.
  • image features are usually extracted by a neural network, and then a classification module is applied to classify images based on image features.
  • This image classification method is distributed, that is, the image feature extraction process and the image classification process are independent of each other, and the computational complexity is high. Therefore, there is no effective solution for how to reduce the computational complexity to reduce the resource consumption of the model and improve the classification efficiency.
  • the embodiments of the present application provide an image classification model training method, apparatus, computer equipment and storage medium, which can train an image classification model with a simplified structure.
  • the embodiment of the present application provides an image classification model training method, which is applied to computer equipment, and the method includes:
  • the image classification model Inputting the at least two first images and the corresponding second images into an image classification model, and the image classification model outputs the classification results of the at least two first images and the classification results of the corresponding second images;
  • the reference classification results of the at least two first images are generated, and the reference classification results of the first images are to characterize the probability that the first image and the corresponding at least two second images belong to each category;
  • the classification results of the second images corresponding to the at least two first images and the classification results of the at least two first images An error value between the reference classification results of an image to determine the total error value;
  • the parameters of the image classification model are updated based on the total error value, and when the updated image classification model obtains the classification results of the at least two first images and the classification results of the corresponding second images that satisfy the reference condition when the training is complete.
  • the embodiment of the present application provides an image classification model training device, and the device includes:
  • an image acquisition module configured to perform image transformation on at least two first images respectively to obtain at least two second images corresponding to each first image
  • a classification module configured to input the at least two first images and the corresponding second images into an image classification model, and the image classification model outputs the classification results of the at least two first images and the classification results of the corresponding second images;
  • a result obtaining module configured to generate reference classification results of the at least two first images based on the classification results of the at least two second images corresponding to the respective first images in response to the respective classification results not satisfying the reference condition, the first images
  • the reference classification result is used to characterize the probability that the first image and the corresponding at least two second images belong to each category;
  • An error determination module configured to be based on an error value between the classification results of the at least two first images and the reference classification results of the at least two first images, and the classification results of the second images corresponding to the at least two first images and the error value between the reference classification results of the at least two first images to determine the total error value;
  • a parameter update module configured to update the parameters of the image classification model based on the total error value, when the updated image classification model obtains the output classification results of the at least two first images and the classification of the corresponding second images When the result satisfies the reference condition, it is determined that the training is completed.
  • An embodiment of the present application provides a computer device, the computer device includes one or more processors and one or more memories, and the one or more memories store at least one piece of program code, and the at least one piece of program code is composed of the one or more memories.
  • a plurality of processors are loaded and executed to implement the operations performed by the image classification model training method.
  • Embodiments of the present application provide a computer-readable storage medium, where at least one piece of program code is stored in the computer-readable storage medium, and the at least one piece of program code is loaded and executed by a processor to implement operations performed by the image classification model training method .
  • An embodiment of the present application provides a computer program product, where the computer program product includes at least one piece of program code, and the at least one piece of program code is stored in a computer-readable storage medium.
  • the processor of the computer device reads the at least one piece of program code from the computer-readable storage medium, and the processor executes the at least one piece of program code, so that the computer device implements the operations performed by the image classification model training method.
  • a reference classification result is constructed based on the classification results output by the image classification model, Since the reference classification result can indicate the probability that the image belongs to each category, the parameters of the image classification model are updated based on the total error value between the classification result of each image and the reference classification result, and the trained image classification model is obtained.
  • the image classification model can directly output image classification results with higher accuracy based on the input image, reducing the complexity of the image classification process of the image classification model.
  • FIG. 1 is a schematic diagram of an implementation environment of an image classification model training method provided by an embodiment of the present application
  • FIG. 2 is a flowchart of a training method for an image classification model provided by an embodiment of the present application
  • FIG. 3 is a flowchart of an image classification model training method provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of an image classification model provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a training method for an image classification model provided by an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of an apparatus for training an image classification model provided by an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a terminal provided by an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a server provided by an embodiment of the present application.
  • an image classification model constructed based on a neural network is trained, so that the image classification model can realize end-to-end image classification, that is, image classification results are output based on images without additionally applying a classification algorithm for image classification.
  • the image classification model trained by using the image classification model training method provided in the embodiment of the present application can be used to classify and organize the images stored in the electronic album, so as to facilitate the management of the images in the electronic album.
  • the trained image classification model can also automatically classify the pictures in the recommendation system or the network gallery, so that it can recommend pictures that the user may be interested in according to the user's preference when the recommendation time is reached, or when the user searches for pictures.
  • the image classification model trained by using the image classification model training method provided in the embodiment of the present application can also be used in the medical field, for example, the auxiliary recognition of medical images can be performed, and the trained image classification model can be used.
  • the imaging area of interest such as the target blood vessel area, the target organ area, etc., is identified from the medical image, so that the diagnosis efficiency can be improved.
  • FIG. 1 is a schematic diagram of an implementation environment of an image classification model training method provided by an embodiment of the present application.
  • the implementation environment includes: a terminal 110 and an image classification platform 140 .
  • the terminal 110 may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a vehicle terminal, etc., but is not limited thereto.
  • the terminal 110 has an application program that supports image classification installed and running.
  • the application may be an application of image recognition, image retrieval, and the like.
  • the terminal 110 may be a user-side device or a development-side device, and a user account is logged in an application running in the terminal 110 .
  • the terminal 110 may generally refer to one of multiple terminals, and only the terminal 110 is used as an example in this embodiment of the present application.
  • the image classification platform 140 is used to provide background services for applications supporting image classification.
  • the image classification platform 140 undertakes the main image classification work, and the terminal 110 undertakes the secondary image classification work; or, the image classification platform 140 undertakes the secondary image classification work, and the terminal 110 undertakes the main image classification work; or, the image classification platform 140 or the terminal 110 respectively Can undertake image classification work alone.
  • the image classification platform 140 includes an access server, an image classification server, and a database.
  • the access server is used to provide access services for the terminal 110 .
  • the image classification server is used to provide background services related to image classification. There can be one or more image classification servers.
  • An image classification model may be set in the image classification server, and the image classification server provides support for the training and application process of the model.
  • the above server may be an independent physical server, a server cluster or a distributed system composed of multiple physical servers, or a cloud service, cloud database, cloud computing, cloud function, cloud storage, network service, cloud Cloud servers for basic cloud computing services such as communication, middleware services, domain name services, security services, CDN (Content Delivery Network), and big data and artificial intelligence platforms.
  • the above-mentioned terminal 110 and the image classification platform 140 may be directly or indirectly connected through wired or wireless communication, which is not limited in this embodiment of the present application.
  • the number of the above-mentioned terminals may be more or less.
  • the above-mentioned terminal may be only one, or the above-mentioned terminal may be dozens or hundreds, or more.
  • the embodiments of the present application do not limit the number of terminals and device types.
  • the embodiment of the present application provides a training method for an image classification model.
  • data enhancement is first performed on an image used for model training, and the initial image and the image after data enhancement are input into the image classification model, and the The image classification model outputs the image classification results, and then constructs the reference classification results based on the image classification results. Since the reference classification results can be used to indicate the probability that the image belongs to each category, the total error between the classification results of each image and the reference classification results is obtained.
  • the total error value is back propagated to the image classification model, and the parameters of each operation layer in the image classification model are adjusted to obtain a trained image classification model, so that the image classification model can realize end-to-end image classification, and also That is, the image classification model can directly output accurate image classification results based on images, thereby reducing the complexity of image classification.
  • FIG. 2 is a flowchart of a training method for an image classification model provided by an embodiment of the present application.
  • This method can be applied to computer equipment, and the computer equipment can be the above-mentioned terminal or server.
  • the server is used as the execution body to introduce the training method of the image classification model. Referring to FIG. 2 , this embodiment may include: The following steps:
  • the server performs image transformation on the at least two first images, respectively, to obtain at least two second images corresponding to each first image.
  • the first image may be an image stored in the server, an image captured by the server from a video, or an image captured by a device with an image capture function.
  • the camera sends the captured image to the server.
  • This embodiment of the present application does not limit which image is used.
  • the second image is obtained by performing data enhancement on the first image, that is, image transformation.
  • the image transformation method includes image cropping, image flipping, image color dithering, and image color channel reorganization, but is not limited thereto.
  • the server acquires at least two first images in response to a model training instruction sent by the terminal.
  • the terminal may be a terminal used by a developer, and the terminal sends a model training instruction to the server in response to a user operation.
  • the embodiment of the present application does not limit the triggering manner of the model training instruction.
  • the server After acquiring the at least two first images, the server performs image transformation on the at least two first images based on at least one image transformation method to obtain at least two second images corresponding to each first image.
  • the server inputs the at least two first images and the corresponding second images into the image classification model, and the image classification model outputs the classification results of the at least two first images and the classification results of the corresponding second images.
  • the image classification model is a model constructed based on a neural network.
  • the neural network is a Visual Geometry Group (VGG) deep convolutional neural network, a Residual Network (Residual Network, ResNet), etc., which are implemented in this application.
  • VCG Visual Geometry Group
  • ResNet Residual Network
  • the example does not limit the structure of the image classification model.
  • At least one operation layer in the image classification model performs a convolution operation on each image, and extracts each image image features, and predict the image classification results corresponding to each image based on the image features.
  • the image classification result can be expressed in the form of a category probability vector, and the image classification result corresponding to an image can be used to represent the probability that an image belongs to each category. It should be noted that, the embodiments of the present application do not limit the process of performing image classification by the image classification model.
  • the server In response to each classification result not satisfying the reference condition, the server generates reference classification results of the at least two first images based on the classification results of the at least two second images corresponding to each first image, and the reference classification of the first images. The result is used to characterize the probability that the first image and the corresponding at least two second images belong to each category.
  • the reference condition may be set by a developer, and the reference condition may be set such that the mutual information between each image and the classification result is greater than a reference threshold, etc., which is not limited in this embodiment of the present application.
  • the mutual information can indicate the strength of the correlation between the two variables. The stronger the correlation, the greater the mutual information value.
  • the mutual information between each image and the classification result is used to represent the image and the corresponding correlation between the classification results.
  • the server in response to each image classification result not satisfying the reference condition, constructs a reference classification result based on the classification results of at least two second images corresponding to each first image, and then executes subsequent steps based on the reference classification result.
  • Model parameter tuning process since the reference classification result is obtained based on the second image, that is, the classification result of the image after data enhancement, the subsequent model training steps are performed based on the reference classification result, so that the output result of the image classification model has Data enhancement invariance, that is, at least two second images obtained by performing data enhancement on the same first image belong to the same category.
  • the server determines that the training of the image classification model is completed in response to each classification result satisfying the reference condition.
  • the server is based on an error value between the classification results of the at least two first images and the reference classification results of the at least two first images, and the classification results of the second images corresponding to the at least two first images and the at least two first images.
  • the error value between the reference classification results of the two first images determines the total error value.
  • the total error value is used to characterize the accuracy of the output result of the image classification model, and the higher the accuracy, the smaller the total error value.
  • the server obtains the error value between each image and the corresponding classification result respectively, obtains the error value between the classification result of the first image and the classification result of the second image, based on the two types of error values to the total error value. It should be noted that the above description of the method for obtaining the total error value is only an exemplary description, and the embodiment of the present application does not limit which method is used to obtain the total error value.
  • the server updates the parameters of the image classification model based on the total error value.
  • the updated image classification model obtains the output classification results of the at least two first images and the classification results of the corresponding second images, the classification results are satisfied.
  • the above reference conditions it is determined that the training is completed.
  • the server after the server obtains the total error value, it backpropagates the total error value to the image classification model, and solves the parameters of each operation layer in the image classification model based on the gradient descent algorithm until the image classification model is used.
  • Each classification result obtained by the classification model satisfies the reference condition, and it is determined that the training of the image classification model is completed. It should be noted that, in this embodiment of the present application, which method is used to update the parameters of the image classification model is not limited.
  • a reference classification result is constructed based on the classification results output by the image classification model, Since the reference classification result can indicate the probability that the image belongs to each category, the parameters of the image classification model are updated based on the total error value between the classification result of each image and the reference classification result, and the trained image classification model is obtained.
  • the image classification model can directly output image classification results with higher accuracy based on the input image, reducing the complexity of the image classification process of the image classification model.
  • FIG. 3 is a flowchart of an image classification model training method provided by an embodiment of the present application. With reference to FIG. 3 , the training process of the above image classification model is described.
  • the server acquires at least two first images, performs image transformation on the at least two first images respectively, and obtains at least two second images corresponding to each first image.
  • the second image is obtained by performing image transformation on the first image, that is, the second image is an image after data enhancement.
  • the server obtains at least two first images in response to the model training instruction, and performs image transformation on the at least two first images based on at least one of image cropping, image flipping, image color dithering, and image color channel reorganization, respectively, to obtain At least two second images corresponding to each first image.
  • image transformation method that is, the data enhancement method
  • the embodiment of the present application does not limit which method is used for data enhancement.
  • the number of the first image and the second image is not limited.
  • the batch size of model training can be set to 128, the server reads 128 first images during each model training, and after data enhancement is performed on any first image, corresponding M second images are obtained.
  • M is a positive integer, and the value of M may be set by the developer. For example, M may be set to 10, and the value of M is not limited in this embodiment of the present application.
  • the first image and the second image are both represented as digital matrices composed of pixel values, that is, in the following steps, based on the representation of the first image and the second image Matrix of numbers for model training.
  • the server inputs the at least two first images and the corresponding second images into the image classification model, and the image classification model outputs the classification results of the at least two first images and the classification results of the corresponding second images.
  • the image classification model can cluster each first image and each second image, that is, divide each image into different clusters according to different features reflected by each image, and at least one image in the same cluster belongs to the same category.
  • the image classification model is a model constructed based on a convolutional neural network.
  • the image classification model is described by taking as an example that the image classification model is constructed based on a VGG deep convolutional neural network.
  • FIG. 4 is a schematic structural diagram of an image classification model provided by an embodiment of the present application.
  • the image classification model includes five convolution units, namely convolution units 401 , 402 , 403 , 404 and 405 .
  • Each convolution unit includes at least one convolution layer, and each convolution unit is followed by a pooling layer; the image classification model further includes at least one fully connected layer 406 and a softmax (normalized exponential function) layer 407 .
  • the image classification model may also include other units, such as an input unit, an output unit, and the like, which are not limited in this embodiment of the present application.
  • the image classification process is described by taking the image classification model shown in FIG. 4 as an example.
  • the server inputs at least two first images and at least two second images into the image classification process.
  • each convolution unit in the image classification model performs convolution operations on each image to extract the image features of each image, and downsamples the image features extracted by each convolution unit through a pooling layer , so as to reduce the dimensionality of image features to reduce the amount of data processing in subsequent operations.
  • each image features of each image are mapped to vectors through at least one fully connected layer, and finally each element in the vector output by the last fully connected output is mapped to the interval of [0, 1] through the softmax layer.
  • the classification result corresponding to each image is a class probability vector, and an element in the class probability vector is used to represent the probability that the image belongs to a class.
  • the server may input the first image and the second image of any size into the image classification model, or adjust the first image and the second image to a reference size and then input the image classification model. For example, before inputting the first image and the second image into the image classification model, the server scales each first image and each second image according to the actual situation, so as to adjust each first image and each second image to a reference size.
  • the reference size may be set by a developer, which is not limited in this embodiment of the present application.
  • the server determines whether each classification result satisfies the reference condition.
  • the reference condition is used to measure whether the image classification model converges. In a possible implementation manner, it may be determined based on mutual information whether the classification result satisfies the reference condition, and whether to continue training the image classification model.
  • the reference condition is set by the developer, which is not limited in the embodiment of the present application.
  • the manner in which the server determines whether each classification result satisfies the reference condition includes any one of the following multiple implementation manners.
  • the reference condition includes data restriction conditions on the first mutual information and the second mutual information.
  • the first mutual information is used to represent the correlation between each first image and the corresponding classification result, and the stronger the correlation, the larger the value of the first mutual information;
  • the second mutual information is used to represent each first image.
  • the correlation between the classification result of an image and the corresponding classification result of the second image that is, the correlation between the classification result of the image before data enhancement and the classification result of the image after data enhancement.
  • the larger the value of mutual information, that is, the images before data enhancement and after data enhancement, the corresponding classification results should be the same, that is, the image classification results should have data enhancement invariance.
  • the server acquires the first mutual information between each first image and the classification result of each first image. For example, the server obtains the first sub-mutual information between each first image and the corresponding classification result respectively, and takes an average value of the sum of each first sub-mutual information as the first mutual information.
  • the server obtains the second mutual information between the classification result of each first image and the classification result of the corresponding second image. For example, the server obtains the classification result of each first image and the corresponding classification result of each second image respectively.
  • the second sub-mutual information between them is taken as the second mutual information by taking the average value of the sum of the respective second sub-mutual information.
  • first mutual information is greater than or equal to the first threshold and the second mutual information is greater than or equal to the second threshold, it is determined that the first mutual information and the second mutual information satisfy the reference condition, that is, each classification result meets the reference condition; otherwise, determine The first mutual information and the second mutual information do not satisfy the reference condition, that is, each classification result does not satisfy the reference condition.
  • the first threshold and the second threshold may be set by developers, which are not limited in this embodiment of the present application. It should be noted that the above description of the method for obtaining the first mutual information and the second mutual information is only an exemplary description, and the embodiment of this application does not limit which method is used to obtain the first mutual information and the second mutual information .
  • the reference condition includes a data restriction condition for the third mutual information.
  • the third mutual information is used to represent the accuracy of the output result of the image classification model, and the value of the third mutual information is positively correlated with the accuracy of the output result of the image classification model. For example, the sum of the first mutual information and the second mutual information is determined as the third mutual information.
  • the server determines third mutual information based on the first mutual information and the second mutual information, and if the third mutual information is greater than or equal to a third threshold, determines that the third mutual information satisfies the reference If the third mutual information is less than the reference threshold, it is determined that the third mutual information does not meet the reference condition, that is, it is determined that each classification result does not meet the reference condition.
  • the third threshold is set by the developer, which is not limited in this embodiment of the present application.
  • the method for determining the third mutual information can be expressed as the following formula (1):
  • x represents the first image
  • y represents the classification result of the first image
  • I(x,y)I(x,y) represents the first mutual information
  • I represents the third mutual information
  • the reference condition includes a first restriction condition on the first mutual information, the second mutual information, and a second restriction condition on the number of times of model training.
  • the reference condition may be set such that both the first mutual information and the second mutual information obtained in the current model training process satisfy the data restriction condition, and the number of times of model training is greater than the threshold of times.
  • the reference condition can also be set as a model training model in which both the first mutual information and the second mutual information obtained in the current model training process satisfy the data restriction condition, and both the first mutual information and the second mutual information satisfy the data restriction condition The count is greater than the count threshold.
  • the reference condition can also be set as follows: the first mutual information and the second mutual information obtained in the current model training process both satisfy the data restriction conditions, and the first mutual information and the second mutual information obtained in each model training process The information shows a converging trend.
  • the reference condition may also be set to other content, which is not limited in this embodiment of the present application.
  • the server executes the following step 304; if each classification result does not meet the reference condition, the server executes the following steps 305 to 309.
  • the server determines that the training of the image classification model is completed.
  • the server determines that the training of the image classification model is completed, and obtains each parameter in the trained image classification model.
  • the embodiment of the present application only one training process is used as an example for description, and the embodiment of the present application does not limit the number of training times of the image classification model.
  • the image classification model is trained multiple times, in a possible implementation manner, if each classification result satisfies the reference condition and the number of training times is greater than or equal to the threshold of the number of training times, it is determined that the training of the image classification model is completed; If the classification result satisfies the reference condition, but the number of training times is less than the threshold of the number of training times, continue to read the next batch of training data to train the image classification model.
  • the server takes an average of the classification results of at least two second images corresponding to each first image, respectively, to obtain first reference data corresponding to each first image.
  • the server obtains the first reference data corresponding to each first image based on the average value of the classification results of the images after data enhancement, that is, the average value of the classification results of the second images, and the first reference data fuses the
  • the classification result feature of the image after data enhancement, the data reference result determined based on the first reference data can also be fused with the classification result feature of the image after data enhancement, and after updating the parameters of the image classification model based on the reference classification result, the image classification result can be
  • the output result of the model has data enhancement invariance, that is, at least two second images obtained by performing data enhancement on the same first image belong to the same category.
  • the first reference data can be determined by formula (2):
  • i represents the serial number of the first image
  • q i represents the first reference data corresponding to the ith first image
  • M represents the total number of second images corresponding to the ith first image
  • m represents the serial number of the second image
  • the server obtains second reference data corresponding to each first image by obtaining the first reference data corresponding to each first image and the evaluation data corresponding to each first reference data.
  • the evaluation data of a first reference data is used to characterize the accuracy of the first reference data.
  • the evaluation data may be represented as a vector consisting of two elements, one element is used to represent the probability that the first reference data is accurate, and the other element is used to represent the probability that the first reference data is inaccurate. For example, if the evaluation data is represented as (0, 1), the probability that the evaluation data indicates that the first reference data is accurate is 1, that is, the first evaluation data is accurate; the evaluation data can also be represented as (0.3, 0.7 ), then the probability that the evaluation data indicates that the first data is inaccurate is 0.3, and the probability that the first reference data is accurate is 0.7. It should be noted that the evaluation data may also be expressed in other forms, which are not limited in the embodiments of the present application.
  • evaluation data corresponding to each first reference data is generated by an evaluator based on each first reference data.
  • the evaluator is used to determine the accuracy of the first reference data.
  • the evaluator is a deep neural network composed of at least one fully connected layer, and the number of fully connected layers in the evaluator can be set by a developer, which is not limited in this embodiment of the present application.
  • the evaluator may be trained based on each first reference data and reference distribution information of each first reference data.
  • the reference distribution information of the first reference data is used to represent the reference value of each element in the first reference data, and the reference distribution information can be sampled from the prior subsection information corresponding to the first reference data, that is, the i-th Reference distribution information of the first image for information from the prior distribution
  • the one-hot vector sampled in , the prior distribution information It can be set by the developer, which is not limited in this embodiment of the present application, and the prior distribution information Each one-hot vector in is sampled with equal probability.
  • the first reference data and the reference distribution information of the first reference data may be input into the evaluator respectively, a loss function is applied to determine the evaluation error value of the output result of the evaluator, and the evaluator is updated based on the evaluation error value
  • the method for obtaining the evaluation error value can be expressed as the following formula (3):
  • B represents the number of the first image
  • i represents the serial number of the first image
  • q i represents the first reference data of the first image with the serial number i
  • C w (q i ) represents the input of the evaluator as q
  • the output result at i indicates that the input to the evaluator is the output result when represents the gradient penalty term, which is used to make the evaluator C w satisfy the Lipschitz constraint, and ⁇ represents the gradient penalty term coefficient; for qi and A vector of samples on the wire.
  • the evaluator in each image classification model training process, may be trained multiple times, and the evaluator obtained in the last training process is obtained as a trained evaluator, and each first reference data q i input the trained evaluator C w to obtain evaluation data C w (q i ) corresponding to each first reference data.
  • the number of times of training of the evaluator may be set by the developer. For example, in each image classification model training process, the number of times of training of the evaluator is set to 5 times, which is not limited in this embodiment of the present application.
  • the above description of the training method of the evaluator is only an exemplary description, and the embodiment of the present application does not limit the training method of the evaluator.
  • the evaluation error value is gradually decreases
  • the probability distribution p(q) of the first reference data can be related to the prior distribution information
  • the Wasserstein distance between them gradually decreases, even if the probability distribution p(q) of the first reference data gradually approaches the prior distribution information
  • the server may take an average value of the evaluation data corresponding to each first reference data to obtain the average evaluation data; and then based on the average evaluation data
  • the gradients respectively adjust the respective first reference data to obtain second reference data corresponding to the respective first images.
  • the acquisition method of the above-mentioned second reference data can be expressed as the following formula (4) and formula (5):
  • B represents the number of first images
  • i represents the serial number of the first image
  • q i represents the first reference data of the first image whose serial number is i
  • C w (q i ) represents the evaluation data of the first reference data qi ;
  • Normalize() represents normalization processing, and the method of normalization processing is not limited in this embodiment of the present application
  • is a hyperparameter, used to control the gradient size, and its numerical value It is set by the developer, for example, it can be set to 0.04, which is not limited in this embodiment of the present application; Express gradient.
  • the above description of the method for obtaining the second reference data is only an exemplary description, and the embodiment of the present application does not limit which method is used to obtain the second reference data.
  • the method of label sharpening can also be used to obtain the second reference data based on the first reference data, and the method can be expressed as the following formula (6):
  • q i represents the first reference data of the first image with the serial number i
  • the first reference data of the first image with the serial number i Indicates the second reference data
  • T is a hyperparameter
  • the value range is (0, 1)
  • its value is set by the developer
  • Normalize() indicates normalization.
  • the reference distribution information of the first reference data is represented in the form of a one-hot vector
  • the reference distribution information is used to train the evaluator, and then the image classification is trained based on the trained evaluator.
  • the model can make the first reference data gradually approach the form of a one-hot vector, that is, make the image classification result closer to the form of a one-hot vector, enhance the clarity of the image classification result, and make each image classification result correspond to a Explicit category, that is, the cluster category of each image output by the image classification model when performing the clustering task is confident.
  • the server generates the reference classification result corresponding to each first image based on the edge distribution information of the classification result of the second image, the reference edge distribution information, and the second reference data corresponding to each first image.
  • the edge distribution information of the classification result is used to represent the category distribution in the classification result;
  • the reference edge distribution information can be set by the developer, which is not limited in this embodiment of the present application.
  • the class balance of that is, the probability of each image being assigned to each class is equal, and each element in the reference edge distribution information can be set to the same value, that is, the reference edge distribution information is a vector composed of the same value.
  • the server determines a weight vector based on the edge distribution information of the classification result of the second image and the reference edge distribution information; the second reference data corresponding to each first image corresponds to the weight vector The elements at the same position are multiplied to obtain the adjusted second reference data; the adjusted second reference data is normalized to generate the reference classification result.
  • the determination method of the above reference classification result can be expressed as the following formula (7):
  • the edge distribution information of the classification result of the second image is determined based on the classification result of each second image.
  • the edge distribution information of the classification result of the second image can be determined based on the reference edge distribution information and the classification result of each second image, which can be expressed as the following formula (8):
  • represents the momentum coefficient, the value of which can be set by the developer, which is not limited in this embodiment of the present application.
  • the edge distribution information of the classification result of the second image obtained in the previous model training process may be used to determine the model applied in the current model training process.
  • the edge distribution information of the classification result of the second image can be expressed as the following formula (9):
  • represents the momentum coefficient, and its value can be set by the developer. For example, ⁇ can be set to 0.8, which is not limited in this embodiment of the present application.
  • the edge distribution information of the kth class when the number of images predicted to be of the kth class is small, the edge distribution information of the kth class will be less than the prior probability That is, with reference to the edge distribution information, the probability qi ik that the image belongs to the kth class will increase. by minimizing That is, the loss function value of the image classification model, more pictures will be predicted as the kth class.
  • the method When the number of images in the kth class is large, the method will also reduce the images of this class accordingly. In this way, class balance is included in the clustering results.
  • each classification result meeting the reference condition in response to each classification result meeting the reference condition, it is determined that the training of the image classification model is completed, otherwise, based on the classification results of at least two second images corresponding to each first image, generate The reference classification results of the at least two first images, and the reference classification results of one first image are used in the step of characterizing the probability that the one first image and the corresponding at least two second images belong to each category.
  • the first reference data when the first reference data is determined, the image features of the data-enhanced image are fused, and it has data enhancement invariance; the second reference data is close to the one-hot vector and has definiteness; and then based on The first reference data, the second reference data, and the reference edge distribution information are used to determine the reference classification result, and the reference classification result has class balance, and the determined reference classification result can be fused with data to enhance invariance, clarity, and class balance. , and the subsequent model parameter adjustment steps are performed based on the reference classification result, and an image classification model with better performance can be obtained.
  • the server is based on an error value between the classification results of the at least two first images and the reference classification results of the at least two first images, and the classification results of the second images corresponding to the at least two first images and the at least two first images.
  • the error value between the reference classification results of the first images determines the total error value.
  • the server obtains an error value between the image classification result and the reference classification result based on the KL loss function. For example, for any first image, the server obtains the relative entropy between the reference classification result of the any first image and the classification result of the any first image, as the first error value corresponding to the any first image; for For any first image, obtain the sum of the relative entropy of the reference classification result of the any first image and the classification results of the respective second images corresponding to the any first image, as the second error of the any first image value; the sum of at least two first error values and at least two second error values is averaged to obtain the total error value.
  • the method for obtaining the total error value can be expressed as the following formula (10):
  • b) represents the relative entropy between a and b
  • the server updates the parameters of the image classification model based on the total error value.
  • back propagation can be applied to update the parameters of the image classification model.
  • the server solves each parameter in the image segmentation model based on the gradient descent method of the Adam (Adaptove moment estimation) algorithm, until each classification result obtained by using the image classification model satisfies the reference conditions, and determines the image classification model Training is complete.
  • the initial learning rate of the image classification model is set to 0.0005
  • the parameters in the Adam algorithm are set to 0.5 and 0.9. It should be noted that the method for updating the parameters of the image classification model is not limited in this embodiment of the present application.
  • the server after the server updates the parameters of the image classification model, if the number of training times reaches the number threshold, the server obtains the image classification model as a trained image classification model, and if the number of training times does not reach the number threshold, then The server may continue to read the next batch of training data from the training data set, perform the above steps 301 to 309 again, and train the image classification model again until the trained image classification model is obtained.
  • a reference classification result is constructed based on the classification results output by the image classification model, Since the reference classification result can indicate the probability that the image belongs to each category, the parameters of the image classification model are updated based on the total error value between the classification result of each image and the reference classification result, and the trained image classification model is obtained.
  • the image classification model can directly output image classification results with higher accuracy based on the input image, reducing the complexity of the image classification process of the image classification model.
  • FIG. 5 is a schematic diagram of a training method of an image classification model provided by an embodiment of the present application.
  • the server performs data enhancement on the first image 501 to obtain at least two second images 502 , and inputs the first image 501 and at least two second images 502 into the image classification model 503 to obtain each The classification result corresponding to the image; then, based on the image classification result corresponding to each second image, construct the first reference data 504, that is, execute the above-mentioned step 305;
  • the first reference data 504 and the evaluation data 505 Based on the first reference data 504 and the evaluation data 505, obtain the second reference data 506, That is, the above step 306 is performed; then based on the second reference data 506, the edge distribution information 507 and the reference edge distribution information 508 of the classification result of the second image, the reference classification result 509 is obtained, that is, the above step 307 is performed; finally, the KL loss function is applied
  • an image classification model is optimized by constructing a reference classification result that incorporates data enhancement invariance, clarity, and class balance, so that the output of the image classification model tends to the reference classification result, that is, , so that the image classification model directly outputs the clustering category of the image when performing the image clustering task, without additional clustering process, and improves the clustering performance of the model.
  • the training data used in the image classification model training method provided in the embodiment of the present application does not need to be labeled, which can effectively save the cost of labeling, and can be widely used in pre-analysis of unknown data.
  • the above embodiment introduces a method for training an image classification model, and the image classification model obtained by applying the above image classification model training method can be applied to various types of application programs and combined with various application scenarios. For example, it can be applied to an electronic photo album application or a cloud electronic photo album for image classification.
  • the image classification model trained by the image classification model training method provided in the embodiment of the present application a small number of categories can be summarized from a large number of images. Obtain the representative image of each category, and use the representative image of each category as the cover image of each category. Users can quickly understand the information of this category of images through these representative images, and can quickly search for images based on categories when they need to search for images. Improve image search efficiency.
  • the image classification model can also be applied to image collection applications, and the image classification model can be called to sort out the images collected by the user, and divide the images into multiple categories, without manual image classification.
  • the image classification model is used in an image collection application as an example for description.
  • applying the image classification model to image classification may include the following steps.
  • Step 1 The terminal sends an image classification instruction to the server in response to the image classification operation.
  • the terminal is a terminal used by a user, and a target application program for providing an image collection function, such as an electronic photo album, is installed and running on the terminal.
  • the server is the background server of the target application, and the server is equipped with a trained image classification model, and the image classification model is obtained by applying the above-mentioned image classification model training method.
  • an image classification control is displayed in the target application program running on the terminal, and the user selects at least two images from the collected images as the target images to be classified.
  • the user can select at least two images captured within a certain period of time as the target image, or at least two images captured at the same location as the target image, or randomly select At least two images are used as the target image, which is not limited in this embodiment of the present application.
  • the image classification control is triggered, and the terminal responds to the user's triggering operation on the image classification control, obtains the image identification of each target image, generates an image classification instruction, and sends the image classification instruction to the server.
  • an image identifier is used to uniquely indicate an image
  • the image classification instruction includes image identifiers of each target image. It should be noted that the above description of the method for generating the image classification instruction is only an exemplary description, and the embodiment of the present application does not limit which method is used to generate the image classification instruction.
  • Step 2 In response to the image classification instruction, the server invokes an image classification model to classify the target images indicated by the image classification instruction to obtain image classification results of each target image.
  • each image collected by the user is stored in the server synchronously, and after receiving the image classification instruction, the server obtains the at least two image identifiers indicated by the at least two image identifiers based on the at least two image identifiers in the image classification instruction.
  • at least two target images, and the at least two target images are input into the image classification model.
  • the process of obtaining the image classification result of a target image is described.
  • feature extraction is performed on the target image through multiple cascaded convolution units in the image classification model. For example, for each convolution unit, obtain the feature map output by the previous convolution unit, perform a convolution operation on the feature map through at least one convolution layer to obtain a new feature map, and input the new feature map into the following A convolution unit.
  • each convolutional unit may be followed by a pooling layer to perform dimensionality reduction processing on the feature map output by the convolutional unit. That is, a new feature map obtained by a convolution unit is first input to the pooling layer, and the pooling layer performs dimensionality reduction processing on the new feature map, and then inputs it to the next convolution unit.
  • the server obtains the feature map output by the last convolution unit, maps the feature map to a vector through at least one fully connected layer in the image classification model, and then maps each element in the vector to [0,1] through the softmax layer Within the interval of , obtain a class probability vector, that is, the image classification result of the target image, and each element in the class probability vector is used to represent the probability that the target image belongs to each class.
  • Step 3 The server sends the image classification result to the terminal, and the terminal performs image display based on the image classification result.
  • the terminal may determine images belonging to the same category as an image set, and display at least one image set viewing entry on the image classification result viewing page, where the image set viewing entry may display There are logos of this type of image, such as characters, scenery, food, etc., and the representative images of this type of images can also be displayed in the image set viewing portal. Users can click on each image set viewing portal to view all the images in the image set. Include at least one target image. When the user needs to send some images to friends, for example, when sending images taken during travel, it can be based on the classified image set.
  • FIG. 6 is a schematic structural diagram of an image classification model training device provided by an embodiment of the present application.
  • the device includes: an image acquisition module 601, configured to perform image transformation on at least two first images, respectively, to obtain each at least two second images corresponding to the first image;
  • the classification module 602 is configured to input the at least two first images and the corresponding second images into an image classification model, and the image classification model outputs the at least two first images The classification result of the first image and the classification result of the corresponding second image;
  • the result acquisition module 603 is configured to generate the at least two second image classification results based on the classification results of the at least two second images corresponding to each first image in response to each classification result not satisfying the reference condition.
  • the reference classification results of the two first images, the reference classification results of the first images are used to characterize the probability that the first image and the corresponding at least two second images belong to each category;
  • the error determination module 604 is configured to be based on the at least one The error value between the classification results of the two first images and the reference classification results of the at least two first images, the classification results of the second images corresponding to the at least two first images and the classification results of the at least two first images.
  • the total error value is determined with reference to the error value between the classification results;
  • the parameter updating module 605 is configured to update the parameters of the image classification model based on the total error value.
  • the result obtaining module 603 includes: a first obtaining sub-module, configured to obtain the average of the classification results of at least two second images corresponding to the first images respectively, to obtain the first images.
  • the first reference data corresponding to the image the second acquisition sub-module is configured to obtain the second reference corresponding to each first image from the first reference data corresponding to each first image and the evaluation data corresponding to each first reference data data, the evaluation data is used to characterize the accuracy of the first reference data;
  • the third acquisition sub-module is configured to be based on the edge distribution information of the classification result of the second image, the reference edge distribution information and the corresponding first image.
  • the second reference data is used to generate the reference classification result corresponding to each of the first images.
  • the second obtaining sub-module is configured to: take an average value of the evaluation data corresponding to the respective first reference data, to obtain the average evaluation data;
  • the reference data is adjusted to obtain second reference data corresponding to each of the first images.
  • the evaluation data corresponding to the first reference data is generated by an evaluator based on the first reference data, and the evaluator is used to determine the accuracy of the first reference data; the apparatus further includes: training a module configured to train the evaluator based on the respective first reference data and the reference distribution information of the respective first reference data, the reference distribution information of the first reference data is used to characterize each element in the first reference data reference value.
  • the third obtaining sub-module is configured to: determine a weight vector based on edge distribution information of the classification result of the second image and the reference edge distribution information; The reference data is multiplied by the element corresponding to the same position in the weight vector to obtain the adjusted second reference data; the adjusted second reference data is normalized to generate the reference classification result.
  • the error determination module 604 is configured to: for any first image, obtain the relative entropy between the reference classification result of the any first image and the classification result of the any first image, as the The first error value corresponding to any first image; for any first image, obtain the relative entropy of the reference classification result of the any first image and the classification result of each second image corresponding to the any first image. The sum is taken as the second error value of any first image; the sum of at least two first error values and at least two second error values is averaged to obtain the total error value.
  • the apparatus further includes: a mutual information acquisition module, configured to acquire the first mutual information between the respective first images and the classification results of the respective first images; second mutual information between the classification result and the classification result of the corresponding second image; in response to the first mutual information and the second mutual information satisfying the reference condition, it is determined that the respective classification results satisfy the reference condition; in response to the The first mutual information and the second mutual information do not satisfy the reference condition, and it is determined that the respective classification results do not satisfy the reference condition.
  • a mutual information acquisition module configured to acquire the first mutual information between the respective first images and the classification results of the respective first images
  • second mutual information between the classification result and the classification result of the corresponding second image
  • the image acquisition module 601 is configured to: acquire the at least two first images; based on at least one of image cropping, image flipping, image color dithering, and image color channel reorganization, respectively for the at least two first images Perform image transformation on each of the first images to obtain at least two second images corresponding to each of the first images.
  • the device by acquiring the classification results of each image output by the image classification model, when the classification results output by the image classification model do not meet the reference conditions, the reference classification results are constructed based on the classification results output by the image classification model.
  • the reference classification result can indicate the probability that the image belongs to each category. Therefore, based on the total error value between the classification result of each image and the reference classification result, the parameters of the image classification model are updated, and the trained image classification model is obtained.
  • the image classification model can directly output image classification results with high accuracy based on the input image, which reduces the complexity of the image classification process of the image classification model.
  • the image classification model training apparatus provided in the above-mentioned embodiment only uses the division of the above-mentioned functional modules as an example when training the image classification model.
  • Module completion means dividing the internal structure of the device into different functional modules to complete all or part of the functions described above.
  • the image classification model training apparatus and the image classification model training method embodiments provided by the above embodiments belong to the same concept, and the implementation process can refer to the method embodiments.
  • FIG. 7 is a schematic structural diagram of a terminal provided by an embodiment of the present application.
  • the terminal 700 may be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, the standard audio level 3 of moving picture expert compression), MP4 (Moving Picture Experts Group Audio Layer IV, moving picture expert compression standard audio Level 4) Player, laptop or desktop computer.
  • Terminal 700 may also be called user equipment, portable terminal, laptop terminal, desktop terminal, and the like by other names.
  • the terminal 700 includes: one or more processors 701 and one or more memories 702 .
  • the processor 701 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like.
  • the processor 701 can use at least one hardware form among DSP (Digital Signal Processing, digital signal processing), FPGA (Field-Programmable Gate Array, field programmable gate array), and PLA (Programmable Logic Array, programmable logic array).
  • the processor 701 may also include a main processor and a coprocessor.
  • the main processor is a processor used to process data in the wake-up state, also called CPU (Central Processing Unit, central processing unit); the coprocessor is A low-power processor for processing data in a standby state.
  • the processor 701 may be integrated with a GPU (Graphics Processing Unit, image processor), and the GPU is used for rendering and drawing the content that needs to be displayed on the display screen.
  • the processor 701 may further include an AI (Artificial Intelligence, artificial intelligence) processor, where the AI processor is used to process computing operations related to machine learning.
  • AI Artificial Intelligence, artificial intelligence
  • Memory 702 may include one or more computer-readable storage media, which may be non-transitory. Memory 702 may also include high-speed random access memory, as well as non-volatile memory, such as one or more disk storage devices, flash storage devices. In some embodiments, the non-transitory computer-readable storage medium in the memory 702 is used to store at least one piece of program code, and the at least one piece of program code is used to be executed by the processor 701 to implement the methods provided by the method embodiments in this application. Image classification model training methods.
  • the terminal 700 may further include: a peripheral device interface 703 and at least one peripheral device.
  • the processor 701, the memory 702 and the peripheral device interface 703 may be connected by a bus or a signal line.
  • Each peripheral device can be connected to the peripheral device interface 703 through a bus, a signal line or a circuit board.
  • the peripheral device includes at least one of a radio frequency circuit 704 , a display screen 705 , a camera assembly 706 , an audio circuit 707 , a positioning assembly 708 and a power supply 709 .
  • the terminal 700 also includes one or more sensors 710 .
  • the one or more sensors 710 include, but are not limited to, an acceleration sensor 711 , a gyro sensor 712 , a pressure sensor 713 , a fingerprint sensor 714 , an optical sensor 715 and a proximity sensor 716 .
  • FIG. 7 does not constitute a limitation on the terminal 700, and may include more or less components than the one shown, or combine some components, or adopt different component arrangements.
  • FIG. 8 is a schematic structural diagram of a server provided by an embodiment of the present application.
  • the server 800 may vary greatly due to different configurations or performance, and may include one or more processors (Central Processing Units, CPU) 801 and a or more memories 802, wherein, at least one piece of program code is stored in the one or more memories 802, and the at least one piece of program code is loaded and executed by the one or more processors 801 to realize the above-mentioned various method embodiments provided. method.
  • the server 800 may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface for input and output, and the server 800 may also include other components for implementing device functions.
  • a computer-readable storage medium such as a memory including at least one piece of program code, is also provided, and the at least one piece of program code can be executed by a processor to complete the image classification model training method in the foregoing embodiment.
  • the computer-readable storage medium may be Read-Only Memory (ROM), Random Access Memory (RAM), Compact Disc Read-Only Memory (CD-ROM), Tape, floppy disk, and optical data storage devices, etc.
  • a computer program product comprising at least one piece of program code stored in a computer-readable storage medium.
  • the processor of the computer device reads the at least one piece of program code from the computer-readable storage medium, and the processor executes the at least one piece of program code, so that the computer device implements the operations performed by the image classification model training method.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种图像分类模型训练方法、装置、计算机设备及存储介质,属于图像处理技术领域。通过获取图像分类模型的输出各个图像的分类结果,在图像分类模型输出的分类结果不满足参考条件时,基于该图像分类模型输出的分类结果构造参考分类结果,由于参考分类结果可以指示图像属于各个类别的概率,因此基于各个图像的分类结果与参考分类结果之间的总误差值,来更新图像分类模型的参数,获取训练好的图像分类模型。

Description

图像分类模型训练方法、装置、计算机设备及存储介质
相关申请的交叉引用
本申请基于申请号为202010781930.0、申请日为2020年08月06日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本申请涉及图像处理技术领域,特别涉及一种图像分类模型训练方法、装置、计算机设备及存储介质。
背景技术
基于人工智能的图像分类技术可以是按照某个特定标准,例如,图像之间的相似度,把一个图像集分割成不同的类或簇,使得同一个簇内的图像的相似性尽可能大,同时不在同一个簇中的图像的差异性也尽可能地大。
在目前的图像分类方法中,通常先由神经网络提取图像特征,再应用分类模块基于图像特征进行图像分类。这种图像分类方法是分布式的,即图像特征提取过程和图像分类过程是相互独立的,计算复杂度高。因此,如何降低计算复杂度,以减少模型的资源消耗,提升分类效率尚无有效解决方案。
发明内容
本申请实施例提供了一种图像分类模型训练方法、装置、计算机设备及存储介质,可以训练出结构简化的图像分类模型。
本申请实施例提供一种图像分类模型训练方法,应用于计算机设备,该方法包括:
分别对至少两个第一图像进行图像变换,得到每个第一图像对应的至少两个第二图像;
将该至少两个第一图像以及对应的第二图像输入图像分类模型,由该图像分类模型输出该至少两个第一图像的分类结果以及对应的第二图像的分类结果;
响应于各个分类结果不满足参考条件,基于该各个第一图像对应的至少两个第二图像的分类结果,生成该至少两个第一图像的参考分类结果,该第一图像的参考分类结果用于表征该第一图像以及对应的至少两个第二图像属于各个类别的概率;
基于该至少两个第一图像的分类结果与该至少两个第一图像的参考分类结果之间的误差值、该至少两个第一图像对应的第二图像的分类结果与该至少两个第一图像的参考分类结果之间的误差值,确定总误差值;
基于该总误差值更新该图像分类模型的参数,当更新后的所述图像分类模型得到输出的所述至少两个第一图像的分类结果以及对应的第二图像的分类结果满足所述参考条件时,确定训练完成。
本申请实施例提供了一种图像分类模型训练装置,该装置包括:
图像获取模块,配置为分别对至少两个第一图像进行图像变换,得到每个第一图像对应的至少两个第二图像;
分类模块,配置为将该至少两个第一图像以及对应的第二图像输入图像分类模型,由该图像分类模型输出该至少两个第一图像的分类结果以及对应的第二图像的分类结果;
结果获取模块,配置为响应于各个分类结果不满足参考条件基于该各个第一图像对应的至少两个第二图像的分类结果,生成该至少两个第一图像的参考分类结果,该第一图像的参考分类结果用于表征该第一图像以及对应的至少两个第二图像属于各个类别的概率;
误差确定模块,配置为基于该至少两个第一图像的分类结果与该至少两个第一图像的参考分类结果之间的误差值、该至少两个第一图像对应的第二图像的分类结果与该至少两个第一图像的参考分类结果之间的误差值,确定总误差值;
参数更新模块,配置为基于该总误差值更新该图像分类模型的参数,当更新后的所述图像分类模型得到输出的所述至少两个第一图像的分类结果以及对应的第二图像的分类结果满足所述参考条件时,确定训练完成。
本申请实施例提供一种计算机设备,该计算机设备包括一个或多个处理器和一个或多个存储器,该一个或多个存储器中存储有至少一条程序代码,该至少一条程序代码由该一个或多个处理器加载并执行以实现该图像分类模型训练方法所执行的操作。
本申请实施例提供一种计算机可读存储介质,该计算机可读存储介质中存储有至少一条程序代码,该至少一条程序代码由处理器加载并执行以实现该图像分类模型训练方法所执行的操作。
本申请实施例提供一种计算机程序产品,该计算机程序产品包括至少一条程序代码,该至少一条程序代码存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该至少一条程序代码,处理器执行该至少一条程序代码,使得该计算机设备实现该图像分类模型训练方法所执行的操作。
本申请实施例提供的技术方案,通过获取图像分类模型的输出各个图像的分类结果,在图像分类模型输出的分类结果不满足参考条件时,基于该图像分类模型输出的分类结果构造参考分类结果,由于参考分类结果可以指示图像属于各个类别的概率,因此基于各个图像的分类结果与参考分类结果之间的总误差值,来更新图像分类模型的参数,获取训练好的图像分类模型,该训练好的图像分类模型可以基于输入图像,直接输出准确度较高的图像分类结果,降低图像分类模型的图像分类过程复杂度。
附图说明
图1是本申请实施例提供的一种图像分类模型训练方法的实施环境示意图;
图2是本申请实施例提供的一种图像分类模型的训练方法的流程图;
图3是本申请实施例提供的一种图像分类模型训练方法的流程图;
图4是本申请实施例提供的一种图像分类模型的结构示意图;
图5是本申请实施例提供的一种图像分类模型的训练方法示意图;
图6是本申请实施例提供的一种图像分类模型训练装置的结构示意图;
图7是本申请实施例提供的一种终端的结构示意图;
图8是本申请实施例提供的一种服务器的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实 施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请中术语“第一”“第二”等字样用于对作用和功能基本相同的相同项或相似项进行区分,应理解,“第一”、“第二”、“第n”之间不具有逻辑或时序上的依赖关系,也不对数量和执行顺序进行限定。
在本申请中,对基于神经网络构建的图像分类模型进行训练,使图像分类模型可以实现端到端的图像分类,即基于图像之间输出图像分类结果,无需额外应用分类算法进行图像分类。
利用本申请实施例提供的图像分类模型训练方法训练出的图像分类模型,可以用于对电子相册中存储的图像进行分类整理,以便于对电子相册中图像的管理。另外,训练好的图像分类模型还可以对推荐系统或者网络图库中的图片进行自动分类,从而能够在达到推荐时机,或者在用户进行图片搜索时能够根据用户的偏好推荐用户可能感兴趣的图片,从而实现精准推荐;另外,利用本申请实施例提供的图像分类模型训练方法所训练出的图像分类模型还可以用于医疗领域,例如可以进行医疗图像的辅助识别,利用训练好的图像分类模型能够从医疗图像中识别出关注的成像区域,例如目标血管区域、目标器官区域等,从而能够提高诊断效率。
图1是本申请实施例提供的一种图像分类模型训练方法的实施环境示意图。该实施环境包括:终端110和图像分类平台140。
终端110可以是智能手机、平板电脑、笔记本电脑、台式计算机、智能音箱、智能手表、车载终端等,但并不局限于此。终端110安装和运行有支持图像分类的应用程序。该应用程序可以是图像识别、图像检索类应用程序等。示例性的,终端110可以是用户侧设备,也可以是开发侧设备,终端110中运行的应用程序内登录有用户账号。终端110可以泛指多个终端中的一个,本申请实施例仅以终端110来举例说明。
图像分类平台140用于为支持图像分类的应用程序提供后台服务。图像分类平台140承担主要图像分类工作,终端110承担次要图像分类工作;或者,图像分类平台140承担次要图像分类工作,终端110承担主要图像分类工作;或者,图像分类平台140或终端110分别可以单独承担图像分类工作。在一些实施例中,图像分类平台140包括:接入服务器、图像分类服务器和数据库。接入服务器用于为终端110提供接入服务。图像分类服务器用于提供图像分类有关的后台服务。图像分类服务器可以是一台或多台。当图像分类服务器是多台时,存在至少两台图像分类服务器用于提供不同的服务,和/或,存在至少两台图像分类服务器用于提供相同的服务,比如以负载均衡方式提供同一种服务,本申请实施例对此不加以限定。图像分类服务器中可以设置有图像分类模型,该图像分类服务器为该模型的训练和应用过程提供支撑。其中,上述服务器可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN(Content Delivery Network,内容分发网络)、以及大数据和人工智能平台等基础云计算服务的云服务器。
上述终端110与图像分类平台140可以通过有线或无线通信方式进行直接或间接地连接,本申请实施例对此不作限定。
本领域技术人员可以知晓,上述终端的数量可以更多或更少。比如上述终端可以仅为一个,或者上述终端为几十个或几百个,或者更多数量。本申请实施例对终端的数量和设备类型不加以限定。
本申请实施例提供了一种图像分类模型的训练方法,在本方法中,首先对用于进行模型训练的图像进行数据增强,将初始的图像和数据增强后的图像一起输入图像分类模 型,由图像分类模型输出图像分类结果,再基于图像分类结果,构造参考分类结果,由于参考分类结果可以用来指示图像属于各个类别的概率,因此获取各个图像的分类结果与参考分类结果之间的总误差值,将该总误差值反向传播至图像分类模型,对图像分类模型中各个运算层的参数进行调整,得到训练好的图像分类模型,使该图像分类模型可以实现端到端的图像分类,也即是,利用该图像分类模型可以直接基于图像输出准确的图像分类结果,从而降低图像分类的复杂度。
图2是本申请实施例提供的一种图像分类模型的训练方法的流程图。该方法可以应用于计算机设备,该计算机设备可以是上述终端或者服务器,在本申请实施例中,以服务器作为执行主体,对图像分类模型的训练方法进行介绍,参见图2,该实施例可以包括以下步骤:
201、服务器分别对至少两个第一图像进行图像变换,得到每个第一图像对应的至少两个第二图像。
其中,该第一图像可以为存储在服务器中的图像,也可以为服务器从视频中截取的图像,还可以为具备图像采集功能的设备实施采集的图像,例如,相机将拍摄的图像实时发送至服务器。本申请实施例对采用哪种图像不作限定。该第二图像由第一图像进行数据增强,即图像变换得到,该图像变换方式包括图像裁剪、图像翻转、图像色彩抖动以及图像色彩通道重组,但并不局限于此。
在一种可能实现方式中,服务器响应于终端发送的模型训练指令,获取至少两个第一图像。其中,终端可以为开发人员使用的终端,终端响应于用户操作向服务器发送模型训练指令,本申请实施例对该模型训练指令的触发方式不作限定。服务器获取到至少两个第一图像后,基于至少一种图像变换方式对该至少两个第一图像进行图像变换,得到每个第一图像对应的至少两个第二图像。
202、服务器将至少两个第一图像以及对应的第二图像输入图像分类模型,由图像分类模型输出至少两个第一图像的分类结果以及对应的第二图像的分类结果。
其中,该图像分类模型为基于神经网络构建的模型,例如,该神经网络为视觉几何组(Visual Geometry Group,VGG)深度卷积神经网络、残差网络(Residual Network,ResNet)等,本申请实施例对该图像分类模型的结构不作限定。
在一种可能实现方式中,服务器将至少两个第一图像和至少两个第二图像输入图像分类模型后,由图像分类模型中的至少一个运算层对各个图像进行卷积运算,提取各个图像的图像特征,基于图像特征预测各个图像对应的图像分类结果。其中,该图像分类结果可以表示为类别概率向量的形式,一个图像对应的图像分类结果可以用于表征一个图像属于各个类别的概率。需要说明的是,本申请实施例对图像分类模型进行图像分类的过程不作限定。
203、服务器响应于各个分类结果不满足参考条件,基于各个第一图像对应的至少两个第二图像的分类结果,生成该至少两个第一图像的参考分类结果,该第一图像的参考分类结果用于表征该第一图像以及对应的至少两个第二图像属于各个类别的概率。
其中,该参考条件可以由开发人员进行设置,该参考条件可以设置为各个图像与分类结果之间的互信息大于参考阈值等,本申请实施例对此不作限定。其中,互信息可以表示两个变量之间关联性的强弱,关联性越强,互信息数值越大,本申请实施例中,各个图像与分类结果之间的互信息用于表示图像与对应的分类结果之间的关联性。
在本申请实施例中,服务器响应于各个图像分类结果不满足参考条件,基于各个第一图像对应的至少两个第二图像的分类结果,构建参考分类结果,再基于该参考分类结果执行后续的模型参数调整过程。在本申请实施例中,由于参考分类结果是基于第二图像,即数据增强后的图像的分类结果得到的,基于该参考分类结果执行后续的模型训练 步骤,可以使图像分类模型的输出结果具备数据增强不变性,即由同一第一图像进行数据增强后得到的至少两个第二图像均属于同一类别。
在一些实施例中,服务器响应于各个分类结果满足参考条件,确定该图像分类模型训练完成。
204、服务器基于该至少两个第一图像的分类结果与该至少两个第一图像的参考分类结果之间的误差值、该至少两个第一图像对应的第二图像的分类结果与该至少两个第一图像的参考分类结果之间的误差值,确定总误差值。
其中,该总误差值用于表征图像分类模型输出结果的准确度,准确度越高,该总误差值越小。在一种可能实现方式中,服务器分别获取各个图像与对应的分类结果之间的误差值,获取第一图像的分类结果与第二图像的分类结果之间的误差值,基于这两类误差值得到该总误差值。需要说明的是,上述对总误差值获取方法的说明,仅是一种示例性说明,本申请实施例对采用哪种方法获取总误差值不作限定。
205、服务器基于该总误差值更新该图像分类模型的参数,当更新后的所述图像分类模型得到输出的所述至少两个第一图像的分类结果以及对应的第二图像的分类结果满足所述参考条件时,确定训练完成。
在一种可能实现方式中,服务器获取到总误差值后,将总误差值反向传播至图像分类模型,基于梯度下降算法,求解该图像分类模型中各个运算层的参数,直至利用所述图像分类模型得到的各个分类结果满足所述参考条件,确定所述图像分类模型训练完成。需要说明的是,本申请实施例中,对采用哪种方法更新该图像分类模型的参数不作限定。
本申请实施例提供的技术方案,通过获取图像分类模型的输出各个图像的分类结果,在图像分类模型输出的分类结果不满足参考条件时,基于该图像分类模型输出的分类结果构造参考分类结果,由于参考分类结果可以指示图像属于各个类别的概率,因此基于各个图像的分类结果与参考分类结果之间的总误差值,来更新图像分类模型的参数,获取训练好的图像分类模型,该训练好的图像分类模型可以基于输入图像,直接输出准确度较高的图像分类结果,降低图像分类模型的图像分类过程复杂度。
图3是本申请实施例提供的一种图像分类模型训练方法的流程图,结合图3,对上述图像分类模型的训练过程进行说明。
301、服务器获取至少两个第一图像,分别对至少两个第一图像进行图像变换,得到每个第一图像对应的至少两个第二图像。
其中,该第二图像由该第一图像进行图像变换得到,即第二图像是数据增强后的图像。例如,服务器响应于模型训练指令,获取至少两个第一图像,基于图像裁剪、图像翻转、图像色彩抖动、图像色彩通道重组中至少一项,分别对至少两个第一图像进行图像变换,得到各个第一图像对应的至少两个第二图像。需要说明的是,上述对图像变换方法,即数据增强方法的说明,仅是一种示例性说明,本申请实施例对采用哪种方法进行数据增强不作限定。
在本申请实施例中的,对该第一图像、第二图像的数目不作限定。例如,可以将模型训练的批次大小设置为128,服务器在每次模型训练时读取128个第一图像,对任一第一图像进行数据增强后,得到对应的M个第二图像。其中,M为正整数,M的数值可以由开发人员进行设置,例如,可以将M设置为10,本申请实施例对M的数值不作限定。
需要说明的是,在本申请实施例中,该第一图像和第二图像均表示为由像素值组成的数字矩阵,也即是,在下述步骤中,基于表示第一图像、第二图像的数字矩阵进行模型训练。
302、服务器将至少两个第一图像以及对应的第二图像输入图像分类模型,由图像分类模型输出至少两个第一图像的分类结果以及对应的第二图像的分类结果。
其中,该图像分类模型可以对各个第一图像和各个第二图像进行聚类,即根据各个图像所反映的不同特征,将各个图像划分为不同的簇,同一个簇内的至少一个图像属于同一类别。
该图像分类模型为基于卷积神经网络构建的模型,在本申请实施例中,以该图像分类模型是基于VGG深度卷积神经网络构建的模型为例进行说明。图4是本申请实施例提供的一种图像分类模型的结构示意图,如图4所示,该图像分类模型包括5个卷积单元,即卷积单元401、402、403、404以及405,每个卷积单元包括至少一个卷积层,每个卷积单元之后连接有一个池化层;该图像分类模型还包括至少一个全连接层406以及softmax(归一化指数函数)层407。当然,该图像分类模型还可以包括其他单元,例如输入单元、输出单元等,本申请实施例对此不作限定。在本申请实施例中,以图4所示的图像分类模型为例对图像分类过程进行说明,在一种可能实现方式中,服务器将至少两个第一图像和至少两个第二图像输入该图像分类模型,由图像分类模型中的各个卷积单元分别对各个图像进行卷积运算,来提取各个图像的图像特征,通过一个池化层对每个卷积单元提取到的图像特征进行下采样,从而对图像特征进行降维,以降低后续运算过程中的数据处理量。在特征提取完成后,通过至少一个全连接层将各个图像的图像特征映射为向量,最后通过softmax层将最后一个全连接输出的向量中各个元素映射到[0,1]的区间内,得到每个图像对应的分类结果,即类别概率向量,该类别概率向量中的一个元素用于表示图像属于一个类别的概率。
在一种可能实现方式中,该服务器可以将任意尺寸的第一图像、第二图像输入该图像分类模型,也可以将第一图像、第二图像调整为参考尺寸再输入该图像分类模型。例如,该服务器将第一图像、第二图像输入图像分类模型之前,按照实际情况对各个第一图像和各个第二图像进行缩放,以将各个第一图像和各个第二图像调整为参考尺寸。其中,该参考尺寸可以由开发人员进行设置,本申请实施例对此不作限定。
需要说明的是,上述对图像分类方法的说明仅是一种示例性说明,本申请实施例对采用哪种图像分类方法不作限定,本申请实施例对图像分类模型的结构也不作限定。
303、服务器判断各个分类结果是否满足参考条件。
其中,该参考条件用于衡量图像分类模型是否收敛。在一种可能实现方式中,可以基于互信息来判断分类结果是否满足参考条件,确定是否继续对图像分类模型进行训练。其中,该参考条件由开发人员进行设置,本申请实施例对此不作限定。在一种可能实现方式中,服务器确定各个分类结果是否满足参考条件的方式包括下述多种实现方式中的任一种。
实现方式一、在一种可能实现方式中,该参考条件包括对第一互信息、第二互信息的数据限制条件。其中,该第一互信息用于表征各个第一图像与对应的分类结果之间的关联性,关联性越强,该第一互信息的数值越大;该第二互信息用于表征各个第一图像的分类结果与对应的第二图像的分类结果之间的关联性,即数据增强前图像的分类结果与数据增强后图像的分类结果之间的关联性,关联性越强,该第二互信息的数值越大,也即是,在数据增强前和数据增强后的图像,其对应的分类结果应是相同的,即图像分类结果应具备数据增强不变性。在一种可能实现方式中,服务器获取各个第一图像与该各个第一图像的分类结果之间的第一互信息。例如,服务器分别获取各个第一图像与对应的分类结果之间的第一子互信息,对各个第一子互信息之和取平均值,作为该第一互信息。服务器获取各个第一图像的分类结果与对应的第二图像的分类结果之间的第二互信息,例如,服务器分别获取每个第一图像的分类结果与对应的每个第二图像的分类结 果之间的第二子互信息,对各个第二子互信息之和取平均值,作为该第二互信息。若第一互信息大于或等于第一阈值且第二互信息大于或等于第二阈值,则确定该第一互信息和第二互信息满足参考条件,即各个分类结果满足参考条件;否则,确定该第一互信息和第二互信息不满足参考条件,即各个分类结果不满足参考条件。其中,该第一阈值、第二阈值可以由开发人员进行设置,本申请实施例对此不作限定。需要说明的是,上述对第一互信息、第二互信息获取方法的说明,仅是一种示例性说明,本申请实施例对采用哪种方法获取第一互信息和第二互信息不作限定。
实现方式二、在一种可能实现方式中,该参考条件包括对该第三互信息的数据限制条件。其中,该第三互信息用于表征该图像分类模型的输出结果准确度,该第三互信息的数值与图像分类模型的输出结果的准确度正相关。例如,将第一互信息和第二互信息之和确定为该第三互信息。在一种可能实现方式中,服务器基于该第一互信息以及该第二互信息,确定第三互信息,若该第三互信息大于或等于第三阈值,则确定该第三互信息满足参考条件,即确定各个分类结果满足该参考条件;若该第三互信息小于参考阈值,则确定该第三互信息不满足参考条件,即确定各个分类结果不满足该参考条件。其中,该第三阈值由开发人员进行设置,本申请实施例对此不作限定。在一种可能实现方式中,上述第三互信息的确定方法可以表示为下述公式(1):
Figure PCTCN2021102530-appb-000001
其中,x表示第一图像,y表示第一图像的分类结果,
Figure PCTCN2021102530-appb-000002
表示第二图像的分类结果,I(x,y)I(x,y)表示第一互信息,
Figure PCTCN2021102530-appb-000003
表示第二互信息,I表示第三互信息。
实现方式三、在一种可能实现方式中,该参考条件包括对第一互信息、第二互信息的第一限制条件以及对模型训练次数的第二限制条件。例如,该参考条件可以设置为,本次模型训练过程中所获得的第一互信息和第二互信息均满足数据限制条件,且模型训练次数大于次数阈值。该参考条件也可以设置为,本次模型训练过程中所获得的第一互信息和第二互信息均满足数据限制条件,且第一互信息和第二互信息均满足数据限制条件的模型训练次数大于次数阈值。该参考条件还可以设置为,本次模型训练过程中所获得的第一互信息和第二互信息均满足数据限制条件,且各次模型训练过程中获取到的第一互信息、第二互信息呈现收敛趋势。当然,该参考条件还可以设置为其他内容,本申请实施例对此不作限定。在一种可能实现方式中,若第一互信息和第二互信息满足该第一限制条件,且模型训练次数满足该第二限制条件,则确定各个分类结果满足该参考条件;否则,确定各个分类结果不满足该参考条件。
需要说明的是,上述对判断各个分类结果是否满足参考条件的说明仅是一种示例性说明,本申请实施例对采用哪种方法判断本次模型训练过程获取的各个分类结果是否满足参考条件不作限定。
在本申请实施例中,若各个分类结果满足参考条件,服务器执行下述步骤304;若各个分类结果不满足参考条件,服务器执行下述步骤305至步骤309。
304、服务器响应于各个分类结果满足参考条件,确定该图像分类模型训练完成。
在一种可能实现方式中,若各个分类结果满足参考条件,即图像分类模型收敛,则服务器确定图像分类模型训练完成,获取训练好的图像分类模型中的各个参数。
需要说明的是,在本申请实施例中,仅以一次训练过程为例进行说明,本申请实施例对图像分类模型的训练次数不作限定。例如,在对图像分类模型进行多次训练时,在一种可能实现方式中,若各个分类结果满足参考条件,且训练次数大于或等于训练次数阈值时,则确定图像分类模型训练完成;若各个分类结果满足参考条件,但训练次数小于训练次数阈值时,则继续读取下一批次的训练数据对图像分类模型进行训练。
305、服务器响应于各个分类结果不满足参考条件,分别对各个第一图像对应的至 少两个第二图像的分类结果取平均值,得到各个第一图像对应的第一参考数据。
在一种可能实现方式中,服务器基于数据增强后图像的分类结果的平均值,即第二图像分类结果的平均值,得到各个第一图像对应的第一参考数据,该第一参考数据融合了数据增强后图像的分类结果特征,基于该第一参考数据确定处的数据参考结果也可以融合数据增强后图像的分类结果特征,基于该参考分类结果更新图像分类模型的参数之后,可以使图像分类模型的输出结果具备数据增强不变性,即由同一第一图像进行数据增强后得到的至少两个第二图像均属于同一类别。
在一种可能实现方式中,该第一参考数据可以通过公式(2)确定:
Figure PCTCN2021102530-appb-000004
其中,i表示第一图像的序号;q i表示第i个第一图像对应的第一参考数据;M表示第i个第一图像对应的第二图像的总数目,m表示第二图像的序号;
Figure PCTCN2021102530-appb-000005
表示第i个第一图像对应的第m个第二图像;
Figure PCTCN2021102530-appb-000006
表示第二图像
Figure PCTCN2021102530-appb-000007
对应的分类结果。需要说明的是,上述对第一参考数据获取方法的说明,仅是一种示例性说明,本申请实施例对采用哪种方法获取该第一参考数据不作限定。
306、服务器将各个第一图像对应的第一参考数据以及各个第一参考数据对应的评价数据,得到各个第一图像对应的第二参考数据。
其中,一个第一参考数据的评价数据用于表征该一个第一参考数据的准确度。在一种可能实现方式中,该评价数据可以表示为由两个元素组成的向量,一个元素用于表示第一参考数据准确的概率,一个元素用于表示第一参考数据不准确的概率。例如,该评价数据表示为(0,1),则该评价数据指示该第一参考数据准确的概率为1,即该第一评价数据是准确的;该评价数据也可以表示为(0.3,0.7),则评价数据指示第一数据不准确的概率为0.3,该第一参考数据准确的概率为0.7。需要说明的是,该评价数据也可以表示为其他形式,本申请实施例对此不作限定。
在一种可能实现方式中,各个第一参考数据对应的评价数据由评价器基于各个第一参考数据生成。其中,该评价器用于确定该第一参考数据的准确度。在一种可能实现方式中,该评价器为由至少一个全连接层构成的深度神经网络,该评价器中全连接层数目可以由开发人员进行设置,本申请实施例对此不作限定。
在本申请实施例中,可以基于各个第一参考数据以及各个第一参考数据的参考分布信息,对评价器进行训练。其中,第一参考数据的参考分布信息用于表征该第一参考数据中各个元素的参考值,该参考分布信息可以从第一参考数据对应的先验分部信息中采样得到,即第i个第一图像的参考分布信息
Figure PCTCN2021102530-appb-000008
为从先验分布信息
Figure PCTCN2021102530-appb-000009
中采样得到的独热向量,该先验分布信息
Figure PCTCN2021102530-appb-000010
可以由开发人员进行设置,本申请实施例对此不作限定,先验分布信息
Figure PCTCN2021102530-appb-000011
中每个独热向量被采样的概率相等。在一种可能实现方式中,可以将第一参考数据和第一参考数据的参考分布信息分别输入评价器,应用损失函数确定评价器输出结果的评价误差值,基于该评价误差值更新该评价器中各个全连接层的参数。在一种可能实现方式中,该评价误差值的获取方法可以表示为下述公式(3):
Figure PCTCN2021102530-appb-000012
其中,
Figure PCTCN2021102530-appb-000013
表示评价误差值;B表示第一图像的数目,i表示第一图像的序号;q i表示序号为i的第一图像的第一参考数据,C w(q i)表示评价器的输入为q i时的输出结果,
Figure PCTCN2021102530-appb-000014
表示评价器的输入为
Figure PCTCN2021102530-appb-000015
时的输出结果;
Figure PCTCN2021102530-appb-000016
表示梯度惩罚项,用于 使评价器C w满足Lipschitz(利普希茨)约束条件,λ表示梯度惩罚项系数;
Figure PCTCN2021102530-appb-000017
为q i
Figure PCTCN2021102530-appb-000018
连线上采样的向量。在一种可能实现方式中,每次图像分类模型训练过程中,可以对评价器进行多次训练,获取最后一次训练过程得到的评价器,作为训练好的评价器,将各个第一参考数据q i输入该训练好的评价器C w,得到各个第一参考数据对应的评价数据C w(q i)。其中,评价器的训练次数可以由开发人员进行设置,例如,在每次图像分类模型训练过程中,评价器的训练次数设置为5次,本申请实施例对此不作限定。需要说明的是,上述对评价器训练方法的说明,仅是一种示例性说明,本申请实施例对评价器的训练方法不作限定。在本申请实施例中,在训练评价器的过程中,评价误差值
Figure PCTCN2021102530-appb-000019
逐渐减小,可以使第一参考数据的概率分布p(q)与先验分布信息
Figure PCTCN2021102530-appb-000020
之间的Wasserstein距离逐渐减小,即使第一参考数据的概率分布p(q)逐渐靠近先验分布信息
Figure PCTCN2021102530-appb-000021
在一种可能实现方式中,服务器获取到各个第一参考数据对应的评价数据后,可以对该各个第一参考数据对应的评价数据取平均值,得到平均评价数据;再基于该平均评价数据的梯度分别对该各个第一参考数据进行调整,得到该各个第一图像对应的第二参考数据。上述第二参考数据的获取方法可以表示为下述公式(4)和公式(5):
Figure PCTCN2021102530-appb-000022
Figure PCTCN2021102530-appb-000023
其中,B表示第一图像的数目,i表示第一图像的序号,q i表示序号为i的第一图像的第一参考数据;C w(q i)表示第一参考数据q i的评价数据;
Figure PCTCN2021102530-appb-000024
表示平均评价数据;
Figure PCTCN2021102530-appb-000025
表示序号为i的第一图像的第二参考数据;Normalize()表示归一化处理,本申请实施例对归一化处理的方法不作限定;α为超参数,用来控制梯度大小,其数值由开发人员进行设置,例如可以设置为0.04,本申请实施例对此不作限定;
Figure PCTCN2021102530-appb-000026
表示
Figure PCTCN2021102530-appb-000027
的梯度。
需要说明的是,上述对获取第二参考数据的方法的说明,仅是一种示例性说明,本申请实施例对采用哪种方法获取该第二参考数据不作限定。例如,还可以采用标签锐化(label sharpen)的方法,来基于第一参考数据得到第二参考数据,该方法可以表示为下述公式(6):
q i=Normalize(q i 1/T)         (6);
其中,q i表示序号为i的第一图像的第一参考数据,序号为i的第一图像的
Figure PCTCN2021102530-appb-000028
表示第二参考数据,T为超参数,取值范围是(0,1),其数值由开发人员进行设置,Normalize()表示归一化处理。
在本申请实施例中,由于第一参考数据的参考分布信息表示为独热向量的形式,在模型训练过程中,应用该参考分布信息训练评价器,再基于训练好的评价器来训练图像分类模型,可以使第一参考数据逐渐接近独热向量的形式,也即是,使图像分类结果更接近独热向量的形式,增强图像分类结果的明确性,使每个图像分类结果可以对应于一个明确的类别,也即是,使图像分类模型在执行聚类任务时,所输出的每个图像的聚类类别是确信的。
307、服务器基于第二图像的分类结果的边缘分布信息、参考边缘分布信息以及各个第一图像对应的第二参考数据,生成各个第一图像对应的该参考分类结果。
其中,分类结果的边缘分布信息用于表征分类结果中的类别分布情况;参考边缘分布信息可以由开发人员进行设置,本申请实施例对此不作限定,在本申请实施例中,为 确保分类结果的类别平衡性,即每个图像被分配到各个类别的概率是相等的,可以将参考边缘分布信息中的各个元素设置为相同值,即参考边缘分布信息为由相同数值构成的向量。
在一种可能实现方式中,服务器基于该第二图像的分类结果的边缘分布信息以及该参考边缘分布信息,确定权重向量;将该各个第一图像对应的第二参考数据与该权重向量中对应相同位置的元素相乘,得到调整后的第二参考数据;对该调整后的第二参考数据进行归一化处理,生成该参考分类结果。上述参考分类结果的确定方法可以表示为下述公式(7):
Figure PCTCN2021102530-appb-000029
其中,
Figure PCTCN2021102530-appb-000030
表示序号为i的第一图像的参考分类结果;
Figure PCTCN2021102530-appb-000031
表示序号为i的第一图像的第二参考数据;
Figure PCTCN2021102530-appb-000032
表示图像分类模型所输出图像分类结果的边缘分布信息;
Figure PCTCN2021102530-appb-000033
表示参考边缘分布信息;
Figure PCTCN2021102530-appb-000034
表示分类结果的边缘分布信息与参考边缘分布信息相除得到权重向量;Normalize()表示归一化处理。
在一种可能实现方式中,上述第二图像的分类结果的边缘分布信息基于各个第二图像的分类结果确定。在图像分类模型进行第一次训练时,可以基于参考边缘分布信息和各个第二图像的分类结果,确定该第二图像的分类结果的边缘分布信息,可以表示为下述公式(8):
Figure PCTCN2021102530-appb-000035
其中,
Figure PCTCN2021102530-appb-000036
表示在本次模型训练过程中获取到的第二图像的分类结果的边缘分布信息,
Figure PCTCN2021102530-appb-000037
表示参考边缘分布信息;i表示第一图像的序号,m表示第二图像的序号;B表示第一图像的数目,M表示每个第一图像对应的第二图像的数目,
Figure PCTCN2021102530-appb-000038
表示第i个第一图像对应的第m个第二图像;γ表示动量系数,其数值可以由开发人员进行设置,本申请实施例对此不作限定。
在本申请实施例中,在除第一次以外的模型训练过程中,可以基于前一次模型训练过程所获取的第二图像的分类结果的边缘分布信息,来确定本次模型训练过程中所应用的第二图像的分类结果的边缘分布信息,可以表示为下述公式(9):
Figure PCTCN2021102530-appb-000039
其中,
Figure PCTCN2021102530-appb-000040
表示在本次模型训练过程中获取到的第二图像的分类结果的边缘分布信息,
Figure PCTCN2021102530-appb-000041
表示在前一次模型训练过程中所获取的第二图像的分类结果的边缘分布信息;B表示第一图像的数目,M表示每个第一图像对应的第二图像的数目;i表示第一图像的序号,m表示第二图像的序号;
Figure PCTCN2021102530-appb-000042
表示第i个第一图像对应的第m个第二图像;γ表示动量系数,其数值可以由开发人员进行设置,例如,γ可以设置为0.8,本申请实施例对此不作限定。在本申请实施例中,当预测为第k类的图像数较少时,第k类的边缘分布信息
Figure PCTCN2021102530-appb-000043
将小于先验概率
Figure PCTCN2021102530-appb-000044
即参考边缘分布信息,从而图像属于第k类的概率q ik会提升。通过最小化
Figure PCTCN2021102530-appb-000045
即图像分类模型的损失函数值,会有更多的图片被预测为第k类。当第k类的图像数较多时,该方法也会相应地减少该类图片。 这样,聚类结果中包含了类别平衡性。
需要说明的是,上述步骤304至步骤307,是响应于各个分类结果满足参考条件,确定该图像分类模型训练完成,否则,基于各个第一图像对应的至少两个第二图像的分类结果,生成该至少两个第一图像的参考分类结果,一个第一图像的参考分类结果用于表征该一个第一图像以及对应的至少两个第二图像属于各个类别的概率的步骤。在上述获取参考分类结果的过程中,在确定第一参考数据时,融合了数据增强后图像的图像特征,具备数据增强不变性;第二参考数据接近于独热向量,具备明确性;再基于第一参考数据、第二参考数据以及参考边缘分布信息,确定该参考分类结果,且参考分类结果具备类别平衡性,则确定出的参考分类结果可以融合数据增强不变性、明确性和类别平衡性,基于该参考分类结果执行后续的模型参数调整步骤,可以获取到表现效果更好的图像分类模型。
308、服务器基于至少两个第一图像的分类结果与该至少两个第一图像的参考分类结果之间的误差值、该至少两个第一图像对应的第二图像的分类结果与该至少两个第一图像的参考分类结果之间的误差值,确定总误差值。
在一种可能实现方式中,服务器基于KL损失函数获取图像分类结果与参考分类结果之间的误差值。例如,对于任一第一图像,服务器获取该任一第一图像的参考分类结果与该任一第一图像的分类结果的相对熵,作为该任一第一图像对应的第一误差值;对于任一第一图像,获取该任一第一图像的参考分类结果与该任一第一图像对应的各个第二图像的分类结果的相对熵之和,作为该任一第一图像的第二误差值;对至少两个第一误差值以及至少两个第二误差值之和取平均,得到该总误差值。在一种可能实现方式中,该总误差值的获取方法可以表示为下述公式(10):
Figure PCTCN2021102530-appb-000046
其中,KL(a||b)表示获取a和b之间的相对熵;p θ(y|x=x i)表示输入为x i时图像分类模型的输出结果,
Figure PCTCN2021102530-appb-000047
表示输入为
Figure PCTCN2021102530-appb-000048
时图像分类模型的输出结果;
Figure PCTCN2021102530-appb-000049
表示总误差值。需要说明的是,上述对获取总误差值的方法的说明,仅是一种示例性说明,本申请实施例对采用哪种方法获取该总误差值不作限定。
309、服务器基于总误差值更新图像分类模型的参数。
在一种可能实现方式中,可以应用反向传播法(back propagation)来更新图像分类模型的参数。例如,服务器基于Adam(Adaptove moment estimation,适应性矩估计)算法的梯度下降法求解该图像分割模型中的各个参数,直至利用该图像分类模型得到的各个分类结果满足参考条件,确定该图像分类模型训练完成。在一些实施例中,该图像分类模型的初始学习率设置为0.0005,Adam算法中的参数设置为0.5和0.9。需要说明的是,本申请实施例对图像分类模型参数更新的方法不作限定。
在一种可能实现方式中,服务器对图像分类模型的参数更新完成后,若训练次数达到次数阈值,则服务器获取该图像分类模型作为训练好的图像分类模型,若训练次数未达到次数阈值,则服务器可以继续从训练数据集中读取下一批次的训练数据,重新执行上述步骤301至步骤309,再次对该图像分类模型进行训练,直到获取到训练好的图像分类模型。
本申请实施例提供的技术方案,通过获取图像分类模型的输出各个图像的分类结果,在图像分类模型输出的分类结果不满足参考条件时,基于该图像分类模型输出的分类结果构造参考分类结果,由于参考分类结果可以指示图像属于各个类别的概率,因此基于各个图像的分类结果与参考分类结果之间的总误差值,来更新图像分类模型的参数,获取训练好的图像分类模型,该训练好的图像分类模型可以基于输入图像,直接输 出准确度较高的图像分类结果,降低图像分类模型的图像分类过程复杂度。
图5是本申请实施例提供的一种图像分类模型的训练方法示意图,结合图5,对上述图像分类模型训练过程进行说明。以一个第一图像为例,首先,服务器对第一图像501进行数据增强,得到至少两个第二图像502,将第一图像501和至少两个第二图像502输入图像分类模型503,得到各个图像对应的分类结果;然后,基于各个第二图像对应的图像分类结果,构造第一参考数据504,即执行上述步骤305;基于第一参考数据504和评价数据505,得到第二参考数据506,即执行上述步骤306;再基于第二参考数据506、第二图像的分类结果的边缘分布信息507、参考边缘分布信息508,得到参考分类结果509,即执行上述步骤307;最后,应用KL损失函数获取各个图像的分类结果与参考分类结果509之间的总误差值,基于该总误差值更新图像分类模型503的参数。在本申请实施例中,通过构造一个融合了数据增强不变性、明确性和类别平衡性的参考分类结果,来优化图像分类模型,使图像分类模型的输出趋于该参考分类结果,也即是,使图像分类模型在执行图像聚类任务时,直接输出图像的聚类类别,而无需额外的聚类过程,提高模型的聚类表现。且本申请实施例提供的图像分类模型训练方法所使用的训练数据无需标注,可以有效节省标注成本,可以广泛应用于对未知数据进行前期分析。
需要说明的是,在本申请实施例中,仅以图像分类模型进行训练为例进行说明,本申请实施例提供的技术方案,也可以应用于其他模型的训练,例如,视频分类模型、文本识别模型、语音分类模型等,即可以引用于基于循环神经网络(Recurrent Neural Network,RNN)、长短时记忆网络(Long Short-Term Memory,LSTM)、转换器的双向编码表示模型(Bidirectional Encoder Representations from Transformers,BERT)等神经网络构建的模型,本申请实施例对此不作限定。
上述实施例介绍了对图像分类模型进行训练的方法,应用上述图像分类模型训练方法所训练得到的图像分类模型,可以应用于多种类型的应用程序中,与多种应用场景相结合。例如,可以应用于电子相册应用程序或者云端电子相册的图片分类整理。利用本申请实施例提供的图像分类模型训练方法所训练出的图像分类模型,可以从大量图像中归纳出少量类别,例如可以将电子相册中的图像归纳为风景、人物、美食等类别,并且可以获取每个类别的代表图像,将每个类别的代表图像作为各个类别的封面图像,用户可以通过这些代表图像快速了解这一类图像的信息,并且能够在需要搜索图像时基于类别实现快速查找,提高图像查找效率。该图像分类模型还可以应用于图像收集类应用程序中,可以调用该图像分类模型对用户收集的图像进行整理,将图像分为多个类别,无需人工进行图像分类。在本申请实施例中,以该图像分类模型应用于图像收集类应用程序中为例进行说明,在一种可能实现方式中,应用该图像分类模型进行图像分类可以包括以下步骤。
步骤一、终端响应于图像分类操作,向服务器发送图像分类指令。
其中,该终端为用户使用的终端,该终端安装和运行有用于提供图像收集功能的目标应用程序,例如,电子相册等。该服务器为该目标应用程序的后台服务器,该服务器搭载有训练好的图像分类模型,该图像分类模型应用上述图像分类模型训练方法训练得到。
在一种可能实现方式中,终端所运行的目标应用程序中显示有图像分类控件,用户从已收集的图像中选择至少两个图像,作为待分类的目标图像。以电子相册应用程序为例,例如,用户可以选择某一时间段内拍摄的至少两个图像作为该目标图像,也可以选择在同一地点拍摄的至少两个图像作为该目标图像,还可以随机选择至少两个图像作为该目标图像,本申请实施例对此不作限定。用户选择完目标图像后,再触发该图像分类控件,终端响应于用户对该图像分类控件的触发操作,获取各个目标图像的图像标识, 生成图像分类指令,将该图像分类指令发送至该服务器。其中,一个图像标识用于唯一的指示一个图像,该图像分类指令包括各个目标图像的图像标识。需要说明的是,上述对图像分类指令生成方法的说明,仅是一种示例性说明,本申请实施例对采用哪种方法生成图像分类指令不作限定。
步骤二、服务器响应于该图像分类指令,调用图像分类模型,对该图像分类指令所指示的目标图像进行分类,得到各个目标图像的图像分类结果。
在一种可能实现方式中,服务器中同步存储有用户收集的各个图像,服务器接收到图像分类指令后,基于该图像分类指令中的至少两个图像标识,获取该至少两个图像标识所指示的至少两个目标图像,将该至少两个目标图像输入图像分类模型。
在本申请实施例中,以该图像分类模型为基于VGG深度卷积神经网络构建的模型为例,对一个目标图像的图像分类结果获取过程进行说明。在一种可能实现方式中,服务器将目标图像输入图像分类模型后,通过图像分类模型中的多个级联的卷积单元对目标图像进行特征提取。例如,对于每个卷积单元,获取前一个卷积单元输出的特征图,通过至少一个卷积层对该特征图进行卷积运算,得到一个新的特征图,将该新的特征图输入下一个卷积单元。在一种可能实现方式中,每个卷积单元之后可以连接有一个池化层,用于对卷积单元输出的特征图进行降维处理。也即是,一个卷积单元将得到的新的特征图先输入池化层,由池化层对该新的特征图进行降维处理后,再输入下一个卷积单元。服务器获取最后一个卷积单元输出的特征图,通过该图像分类模型中的至少一个全连接层,将该特征图映射为向量,再通过softmax层将该向量中各个元素映射到[0,1]的区间内,得到类别概率向量,即目标图像的图像分类结果,该类别概率向量中的各个元素用于表征目标图像属于各个类别的概率。
步骤三、服务器将图像分类结果发送至终端,由终端基于该图像分类结果进行图像显示。
在一种可能实现方式中,终端基于该图像分类结果,可以将属于相同类别的图像确定为一个图像集,在图像分类结果查看页面显示至少一个图像集查看入口,在该图像集查看入口可以呈现有该类图像的标识,例如可以呈现有人物、风景、美食等字样,还可以在该图像集查看入口呈现有该类图像中的代表图像,用户可以点击各个图像集查看入口,查看图像集中所包括的至少一个目标图像。在用户需要给好友发送某一些图像时,例如发送旅游过程中拍摄的图像时,可以基于分类好的图像集。从风景这一图像集中快速确定出需要发送的图像;或者用户想要上传美食照片到社交平台时,可以从美食这一图像集中查找要分享上传的照片,从而提高查找、分享效率。需要说明的是,上述对图像显示方法的说明,仅是一种示例性说明,本申请实施例对采用哪种方法进行图像显示不作限定。
上述所有技术方案,可以采用任意结合形成本申请的实施例。
图6是本申请实施例提供的一种图像分类模型训练装置的结构示意图,参见图6,该装置包括:图像获取模块601,用于分别对至少两个第一图像进行图像变换,得到每个第一图像对应的至少两个第二图像;分类模块602,配置为将该至少两个第一图像以及对应的第二图像输入图像分类模型,由该图像分类模型输出该至少两个第一图像的分类结果以及对应的第二图像的分类结果;结果获取模块603,配置为响应于各个分类结果不满足参考条件基于该各个第一图像对应的至少两个第二图像的分类结果,生成该至少两个第一图像的参考分类结果,该第一图像的参考分类结果用于表征该第一图像以及对应的至少两个第二图像属于各个类别的概率;误差确定模块604,配置为基于该至少两个第一图像的分类结果与该至少两个第一图像的参考分类结果之间的误差值、该至少两个第一图像对应的第二图像的分类结果与该至少两个第一图像的参考分类结果之间 的误差值,确定总误差值;参数更新模块605,配置为基于该总误差值更新该图像分类模型的参数。
在一种可能实现方式中,该结果获取模块603包括:第一获取子模块,配置为分别对该各个第一图像对应的至少两个第二图像的分类结果取平均值,得到该各个第一图像对应的第一参考数据;第二获取子模块,配置为将该各个第一图像对应的第一参考数据以及各个第一参考数据对应的评价数据,得到该各个第一图像对应的第二参考数据,该评价数据用于表征该第一参考数据的准确度;第三获取子模块,配置为基于该第二图像的分类结果的边缘分布信息、参考边缘分布信息以及该各个第一图像对应的第二参考数据,生成该各个第一图像对应的该参考分类结果。
在一种可能实现方式中,该第二获取子模块配置为:对该各个第一参考数据对应的评价数据取平均值,得到平均评价数据;基于该平均评价数据的梯度分别对该各个第一参考数据进行调整,得到该各个第一图像对应的第二参考数据。
在一种可能实现方式中,该各个第一参考数据对应的评价数据由评价器基于该各个第一参考数据生成,该评价器用于确定该第一参考数据的准确度;该装置还包括:训练模块,配置为基于该各个第一参考数据以及该各个第一参考数据的参考分布信息,对该评价器进行训练,该第一参考数据的参考分布信息用于表征该第一参考数据中各个元素的参考值。
在一种可能实现方式中,该第三获取子模块配置为:基于该第二图像的分类结果的边缘分布信息以及该参考边缘分布信息,确定权重向量;将该各个第一图像对应的第二参考数据与该权重向量中对应相同位置的元素相乘,得到调整后的第二参考数据;对该调整后的第二参考数据进行归一化处理,生成该参考分类结果。
在一种可能实现方式中,该误差确定模块604配置为:对于任一第一图像,获取该任一第一图像的参考分类结果与该任一第一图像的分类结果的相对熵,作为该任一第一图像对应的第一误差值;对于任一第一图像,获取该任一第一图像的参考分类结果与该任一第一图像对应的各个第二图像的分类结果的相对熵之和,作为该任一第一图像的第二误差值;对至少两个第一误差值以及至少两个第二误差值之和取平均,得到该总误差值。
在一种可能实现方式中,该装置还包括:互信息获取模块,配置为获取该各个第一图像与该各个第一图像的分类结果之间的第一互信息;获取该各个第一图像的分类结果与对应的第二图像的分类结果之间的第二互信息;响应于该第一互信息和第二互信息满足参考条件,确定所述各个分类结果满足所述参考条件;响应于所述第一互信息和第二互信息不满足参考条件,确定所述各个分类结果不满足所述参考条件。
在一种可能实现方式中,该图像获取模块601配置为:获取该至少两个第一图像;基于图像裁剪、图像翻转、图像色彩抖动、图像色彩通道重组中至少一项,分别对该至少两个第一图像进行图像变换,得到各个第一图像对应的至少两个第二图像。
本申请实施例提供的装置,通过获取图像分类模型的输出各个图像的分类结果,在图像分类模型输出的分类结果不满足参考条件时,基于该图像分类模型输出的分类结果构造参考分类结果,由于参考分类结果可以指示图像属于各个类别的概率,因此基于各个图像的分类结果与参考分类结果之间的总误差值,来更新图像分类模型的参数,获取训练好的图像分类模型,该训练好的图像分类模型可以基于输入图像,直接输出准确度较高的图像分类结果,降低图像分类模型的图像分类过程复杂度。
需要说明的是:上述实施例提供的图像分类模型训练装置在图像分类模型训练时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描 述的全部或者部分功能。另外,上述实施例提供的图像分类模型训练装置与图像分类模型训练方法实施例属于同一构思,其实现过程可参考方法实施例。
上述技术方案所提供的计算机设备可以实现为终端或服务器,例如,图7是本申请实施例提供的一种终端的结构示意图。该终端700可以是:智能手机、平板电脑、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、笔记本电脑或台式电脑。终端700还可能被称为用户设备、便携式终端、膝上型终端、台式终端等其他名称。
通常,终端700包括有:一个或多个处理器701和一个或多个存储器702。
处理器701可以包括一个或多个处理核心,比如4核心处理器、8核心处理器等。处理器701可以采用DSP(Digital Signal Processing,数字信号处理)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)、PLA(Programmable Logic Array,可编程逻辑阵列)中的至少一种硬件形式来实现。处理器701也可以包括主处理器和协处理器,主处理器是用于对在唤醒状态下的数据进行处理的处理器,也称CPU(Central Processing Unit,中央处理器);协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中,处理器701可以在集成有GPU(Graphics Processing Unit,图像处理器),GPU用于负责显示屏所需要显示的内容的渲染和绘制。一些实施例中,处理器701还可以包括AI(Artificial Intelligence,人工智能)处理器,该AI处理器用于处理有关机器学习的计算操作。
存储器702可以包括一个或多个计算机可读存储介质,该计算机可读存储介质可以是非暂态的。存储器702还可包括高速随机存取存储器,以及非易失性存储器,比如一个或多个磁盘存储设备、闪存存储设备。在一些实施例中,存储器702中的非暂态的计算机可读存储介质用于存储至少一条程序代码,该至少一条程序代码用于被处理器701所执行以实现本申请中方法实施例提供的图像分类模型训练方法。
在一些实施例中,终端700还可以包括有:外围设备接口703和至少一个外围设备。处理器701、存储器702和外围设备接口703之间可以通过总线或信号线相连。各个外围设备可以通过总线、信号线或电路板与外围设备接口703相连。在一些实施例中,外围设备包括:射频电路704、显示屏705、摄像头组件706、音频电路707、定位组件708和电源709中的至少一种。
在一些实施例中,终端700还包括有一个或多个传感器710。该一个或多个传感器710包括但不限于:加速度传感器711、陀螺仪传感器712、压力传感器713、指纹传感器714、光学传感器715以及接近传感器716。
本领域技术人员可以理解,图7中示出的结构并不构成对终端700的限定,可以包括比图示更多或更少的组件,或者组合某些组件,或者采用不同的组件布置。
图8是本申请实施例提供的一种服务器的结构示意图,该服务器800可因配置或性能不同而产生比较大的差异,可以包括一个或多个处理器(Central Processing Units,CPU)801和一个或多个的存储器802,其中,该一个或多个存储器802中存储有至少一条程序代码,该至少一条程序代码由该一个或多个处理器801加载并执行以实现上述各个方法实施例提供的方法。当然,该服务器800还可以具有有线或无线网络接口、键盘以及输入输出接口等部件,以便进行输入输出,该服务器800还可以包括其他用于实现设备功能的部件。
在示例性实施例中,还提供了一种计算机可读存储介质,例如包括至少一条程序代码的存储器,上述至少一条程序代码可由处理器执行以完成上述实施例中的图像分类模型训练方法。例如,该计算机可读存储介质可以是只读存储器(Read-Only Memory, ROM)、随机存取存储器(Random Access Memory,RAM)、只读光盘(Compact Disc Read-Only Memory,CD-ROM)、磁带、软盘和光数据存储设备等。
在示例性实施例中,还提供了一种计算机程序产品,该计算机程序产品包括至少一条程序代码,该至少一条程序代码存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该至少一条程序代码,处理器执行该至少一条程序代码,使得该计算机设备实现该图像分类模型训练方法所执行的操作。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来至少一条程序代码相关的硬件完成,该程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
上述仅为本申请的可选实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (15)

  1. 一种图像分类模型的训练方法,应用于计算机设备,所述方法包括:
    分别对至少两个第一图像进行图像变换,得到每个第一图像对应的至少两个第二图像;
    将所述至少两个第一图像以及对应的第二图像输入图像分类模型,由所述图像分类模型输出所述至少两个第一图像的分类结果以及对应的第二图像的分类结果;
    响应于各个分类结果不满足参考条件,基于所述各个第一图像对应的至少两个第二图像的分类结果,生成所述至少两个第一图像的参考分类结果,所述第一图像的参考分类结果用于表征所述第一图像以及对应的至少两个第二图像属于各个类别的概率;
    基于所述至少两个第一图像的分类结果与所述至少两个第一图像的参考分类结果之间的误差值、所述至少两个第一图像对应的第二图像的分类结果与所述至少两个第一图像的参考分类结果之间的误差值,确定总误差值;
    基于所述总误差值更新所述图像分类模型的参数,当更新后的所述图像分类模型得到输出的所述至少两个第一图像的分类结果以及对应的第二图像的分类结果满足所述参考条件时,确定训练完成。
  2. 根据权利要求1所述的方法,其中,所述基于所述各个第一图像对应的至少两个第二图像的分类结果,生成所述至少两个第一图像的参考分类结果,包括:
    分别对所述各个第一图像对应的至少两个第二图像的分类结果取平均值,得到所述各个第一图像对应的第一参考数据;
    基于所述各个第一图像对应的第一参考数据和各个第一参考数据对应的评价数据,确定所述各个第一图像对应的第二参考数据,所述评价数据用于表征所述第一参考数据的准确度;
    基于所述第二图像的分类结果的边缘分布信息、参考边缘分布信息和所述各个第一图像对应的第二参考数据,生成所述各个第一图像对应的所述参考分类结果。
  3. 根据权利要求2所述的方法,其中,所述基于所述各个第一图像对应的第一参考数据和各个第一参考数据对应的评价数据,确定所述各个第一图像对应的第二参考数据,包括:
    对所述各个第一参考数据对应的评价数据取平均值,得到平均评价数据;
    基于所述平均评价数据的梯度分别对所述各个第一参考数据进行调整,得到所述各个第一图像对应的第二参考数据。
  4. 根据权利要求3所述的方法,其中,所述各个第一参考数据对应的评价数据由评价器基于所述各个第一参考数据生成,所述评价器用于确定所述第一参考数据的准确度;
    所述对所述各个第一参考数据对应的评价数据取平均值,得到平均评价数据之前,所述方法还包括:
    基于所述各个第一参考数据和所述各个第一参考数据的参考分布信息,对所述评价器进行训练,所述第一参考数据的参考分布信息用于表征所述第一参考数据中各个元素的参考值。
  5. 根据权利要求2所述的方法,其中,所述基于所述第二图像的分类结果的边缘分布信息、参考边缘分布信息和所述各个第一图像对应的第二参考数据,生成所述各个第一图像对应的所述参考分类结果,包括:
    基于所述第二图像的分类结果的边缘分布信息和所述参考边缘分布信息,确定权 重向量;
    将所述各个第一图像对应的第二参考数据与所述权重向量中对应相同位置的元素相乘,得到调整后的第二参考数据;
    对所述调整后的第二参考数据进行归一化处理,生成所述参考分类结果。
  6. 根据权利要求1所述的方法,其中,所述基于所述至少两个第一图像的分类结果与所述至少两个第一图像的参考分类结果之间的误差值、所述至少两个第一图像对应的第二图像的分类结果与所述至少两个第一图像的参考分类结果之间的误差值,确定总误差值,包括:
    对于任一第一图像,获取所述任一第一图像的参考分类结果与所述任一第一图像的分类结果的相对熵,作为所述任一第一图像对应的第一误差值;
    对于任一第一图像,获取所述任一第一图像的参考分类结果与所述任一第一图像对应的各个第二图像的分类结果的相对熵之和,作为所述任一第一图像的第二误差值;
    对至少两个第一误差值以及至少两个第二误差值之和取平均,得到所述总误差值。
  7. 根据权利要求1所述的方法,其中,所述将所述至少两个第一图像以及对应的第二图像输入图像分类模型,由所述图像分类模型输出所述至少两个第一图像的分类结果以及对应的第二图像的分类结果之后,所述方法还包括:
    获取所述各个第一图像与所述各个第一图像的分类结果之间的第一互信息;
    获取所述各个第一图像的分类结果与对应的第二图像的分类结果之间的第二互信息;
    响应于所述第一互信息和第二互信息满足参考条件,确定所述各个分类结果满足所述参考条件;
    响应于所述第一互信息和第二互信息不满足参考条件,确定所述各个分类结果不满足所述参考条件。
  8. 根据权利要求1所述的方法,其中,所述分别对至少两个第一图像进行图像变换,得到每个第一图像对应的至少两个第二图像,包括:
    基于图像裁剪、图像翻转、图像色彩抖动、图像色彩通道重组中至少一项,分别对所述至少两个第一图像进行图像变换,得到各个第一图像对应的至少两个第二图像。
  9. 一种图像分类模型训练装置,所述装置包括:
    图像获取模块,配置为分别对至少两个第一图像进行图像变换,得到每个第一图像对应的至少两个第二图像;
    分类模块,配置为将所述至少两个第一图像以及对应的第二图像输入图像分类模型,由所述图像分类模型输出所述至少两个第一图像的分类结果以及对应的第二图像的分类结果;
    结果获取模块,配置为响应于各个分类结果不满足参考条件基于所述各个第一图像对应的至少两个第二图像的分类结果,生成所述至少两个第一图像的参考分类结果,所述第一图像的参考分类结果用于表征所述第一图像以及对应的至少两个第二图像属于各个类别的概率;
    误差确定模块,配置为基于所述至少两个第一图像的分类结果与所述至少两个第一图像的参考分类结果之间的误差值、所述至少两个第一图像对应的第二图像的分类结果与所述至少两个第一图像的参考分类结果之间的误差值,确定总误差值;
    参数更新模块,配置为基于所述总误差值更新所述图像分类模型的参数,当更新后的所述图像分类模型得到输出的所述至少两个第一图像的分类结果以及对应的第二图像的分类结果满足所述参考条件时,确定训练完成。
  10. 根据权利要求9所述的装置,其中,所述结果获取模块包括:
    第一获取子模块,配置为分别对所述各个第一图像对应的至少两个第二图像的分类结果取平均值,得到所述各个第一图像对应的第一参考数据;
    第二获取子模块,配置为将所述各个第一图像对应的第一参考数据以及各个第一参考数据对应的评价数据,得到所述各个第一图像对应的第二参考数据,所述评价数据用于表征所述第一参考数据的准确度;
    第三获取子模块,配置为基于所述第二图像的分类结果的边缘分布信息、参考边缘分布信息以及所述各个第一图像对应的第二参考数据,生成所述各个第一图像对应的所述参考分类结果。
  11. 根据权利要求10所述的装置,其中,所述第二获取子模块配置为:
    对所述各个第一参考数据对应的评价数据取平均值,得到平均评价数据;
    基于所述平均评价数据的梯度分别对所述各个第一参考数据进行调整,得到所述各个第一图像对应的第二参考数据。
  12. 根据权利要求11所述的装置,其中,所述各个第一参考数据对应的评价数据由评价器基于所述各个第一参考数据生成,所述评价器用于确定所述第一参考数据的准确度;
    所述装置还包括:
    训练模块,配置为基于所述各个第一参考数据以及所述各个第一参考数据的参考分布信息,对所述评价器进行训练,所述第一参考数据的参考分布信息用于表征所述第一参考数据中各个元素的参考值。
  13. 根据权利要求10所述的装置,其中,所述第三获取子模块配置为:
    基于所述第二图像的分类结果的边缘分布信息以及所述参考边缘分布信息,确定权重向量;
    将所述各个第一图像对应的第二参考数据与所述权重向量中对应相同位置的元素相乘,得到调整后的第二参考数据;
    对所述调整后的第二参考数据进行归一化处理,生成所述参考分类结果。
  14. 一种计算机设备,所述计算机设备包括一个或多个处理器和一个或多个存储器,所述一个或多个存储器中存储有至少一条程序代码,所述至少一条程序代码由所述一个或多个处理器加载并执行以实现如权利要求1至权利要求8任一项所述的图像分类模型训练方法。
  15. 一种计算机可读存储介质,所述计算机可读存储介质中存储有至少一条程序代码,所述至少一条程序代码由处理器加载并执行以实现如权利要求1至权利要求8任一项所述的图像分类模型训练方法。
PCT/CN2021/102530 2020-08-06 2021-06-25 图像分类模型训练方法、装置、计算机设备及存储介质 WO2022028147A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21853462.6A EP4113376A4 (en) 2020-08-06 2021-06-25 IMAGE CLASSIFICATION MODEL TRAINING METHOD AND APPARATUS, COMPUTER DEVICE AND STORAGE MEDIA
US17/964,739 US20230035366A1 (en) 2020-08-06 2022-10-12 Image classification model training method and apparatus, computer device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010781930.0 2020-08-06
CN202010781930.0A CN111738365B (zh) 2020-08-06 2020-08-06 图像分类模型训练方法、装置、计算机设备及存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/964,739 Continuation US20230035366A1 (en) 2020-08-06 2022-10-12 Image classification model training method and apparatus, computer device, and storage medium

Publications (1)

Publication Number Publication Date
WO2022028147A1 true WO2022028147A1 (zh) 2022-02-10

Family

ID=72658179

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/102530 WO2022028147A1 (zh) 2020-08-06 2021-06-25 图像分类模型训练方法、装置、计算机设备及存储介质

Country Status (4)

Country Link
US (1) US20230035366A1 (zh)
EP (1) EP4113376A4 (zh)
CN (1) CN111738365B (zh)
WO (1) WO2022028147A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111738365B (zh) * 2020-08-06 2020-12-18 腾讯科技(深圳)有限公司 图像分类模型训练方法、装置、计算机设备及存储介质
CN114996590A (zh) * 2022-08-04 2022-09-02 上海钐昆网络科技有限公司 一种分类方法、装置、设备及存储介质
CN115035353B (zh) * 2022-08-11 2022-12-23 粤港澳大湾区数字经济研究院(福田) 图像分类方法、图像分类模型、智能终端及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180089543A1 (en) * 2016-09-23 2018-03-29 International Business Machines Corporation Image classification utilizing semantic relationships in a classification hierarchy
CN108197666A (zh) * 2018-01-30 2018-06-22 咪咕文化科技有限公司 一种图像分类模型的处理方法、装置及存储介质
CN110738263A (zh) * 2019-10-17 2020-01-31 腾讯科技(深圳)有限公司 一种图像识别模型训练的方法、图像识别的方法及装置
CN111738365A (zh) * 2020-08-06 2020-10-02 腾讯科技(深圳)有限公司 图像分类模型训练方法、装置、计算机设备及存储介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8478052B1 (en) * 2009-07-17 2013-07-02 Google Inc. Image classification
CN110399929B (zh) * 2017-11-01 2023-04-28 腾讯科技(深圳)有限公司 眼底图像分类方法、装置以及计算机可读存储介质
CN109741379A (zh) * 2018-12-19 2019-05-10 上海商汤智能科技有限公司 图像处理方法、装置、电子设备及计算机可读存储介质
CN111461155A (zh) * 2019-01-18 2020-07-28 富士通株式会社 训练分类模型的装置和方法
CN111353542B (zh) * 2020-03-03 2023-09-19 腾讯科技(深圳)有限公司 图像分类模型的训练方法、装置、计算机设备和存储介质
CN111046980B (zh) * 2020-03-16 2020-06-30 腾讯科技(深圳)有限公司 一种图像检测方法、装置、设备及计算机可读存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180089543A1 (en) * 2016-09-23 2018-03-29 International Business Machines Corporation Image classification utilizing semantic relationships in a classification hierarchy
CN108197666A (zh) * 2018-01-30 2018-06-22 咪咕文化科技有限公司 一种图像分类模型的处理方法、装置及存储介质
CN110738263A (zh) * 2019-10-17 2020-01-31 腾讯科技(深圳)有限公司 一种图像识别模型训练的方法、图像识别的方法及装置
CN111738365A (zh) * 2020-08-06 2020-10-02 腾讯科技(深圳)有限公司 图像分类模型训练方法、装置、计算机设备及存储介质

Also Published As

Publication number Publication date
US20230035366A1 (en) 2023-02-02
EP4113376A4 (en) 2023-08-16
CN111738365A (zh) 2020-10-02
EP4113376A1 (en) 2023-01-04
CN111738365B (zh) 2020-12-18

Similar Documents

Publication Publication Date Title
WO2022028147A1 (zh) 图像分类模型训练方法、装置、计算机设备及存储介质
TWI773189B (zh) 基於人工智慧的物體檢測方法、裝置、設備及儲存媒體
CN111813532B (zh) 一种基于多任务机器学习模型的图像管理方法及装置
CN110738235B (zh) 肺结核判定方法、装置、计算机设备及存储介质
CN110705489B (zh) 目标识别网络的训练方法、装置、计算机设备和存储介质
WO2023024413A1 (zh) 信息的匹配方法、装置、计算机设备及可读存储介质
CN113902944A (zh) 模型的训练及场景识别方法、装置、设备及介质
CN113128526B (zh) 图像识别方法、装置、电子设备和计算机可读存储介质
CN110135428B (zh) 图像分割处理方法和装置
CN113140012B (zh) 图像处理方法、装置、介质及电子设备
CN116204709A (zh) 一种数据处理方法及相关装置
CN113780148A (zh) 交通标志图像识别模型训练方法和交通标志图像识别方法
CN115878839A (zh) 一种视频推荐方法、装置、计算机设备和计算机程序产品
CN111414966A (zh) 分类方法、装置、电子设备及计算机存储介质
JP2015097036A (ja) 推薦画像提示装置及びプログラム
CN115083020B (zh) 信息生成方法、装置、电子设备和计算机可读介质
CN116978080A (zh) 信息识别方法、装置和计算机可读存储介质
CN111046307B (zh) 用于输出信息的方法和装置
CN117726918A (zh) 模型训练方法、装置、电子设备及计算机存储介质
CN115511826A (zh) 一种图像质量评价方法、装置、设备及存储介质
CN117725289A (zh) 内容搜索方法、装置、电子设备和存储介质
CN116956204A (zh) 多任务模型的网络结构确定方法、数据预测方法及装置
CN117726917A (zh) 模型训练方法、装置、电子设备及计算机存储介质
CN115098644A (zh) 图像与文本匹配方法、装置、电子设备及存储介质
CN116977987A (zh) 图像识别方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21853462

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021853462

Country of ref document: EP

Effective date: 20220929

NENP Non-entry into the national phase

Ref country code: DE