CN115049882A - Model training method, image multi-label classification method and device and electronic equipment - Google Patents

Model training method, image multi-label classification method and device and electronic equipment Download PDF

Info

Publication number
CN115049882A
CN115049882A CN202210817804.5A CN202210817804A CN115049882A CN 115049882 A CN115049882 A CN 115049882A CN 202210817804 A CN202210817804 A CN 202210817804A CN 115049882 A CN115049882 A CN 115049882A
Authority
CN
China
Prior art keywords
image
label classification
sample images
model
abnormal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210817804.5A
Other languages
Chinese (zh)
Inventor
崔东林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202210817804.5A priority Critical patent/CN115049882A/en
Publication of CN115049882A publication Critical patent/CN115049882A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Abstract

The invention provides a model training method, an image multi-label classification device and electronic equipment, relates to the technical field of artificial intelligence, and particularly relates to the technical field of computer vision and deep learning. The specific implementation scheme is as follows: obtaining a sample image set, wherein the sample image set comprises a plurality of abnormal sample images and a plurality of normal sample images, and the abnormal sample images and the normal sample images respectively have multi-label classification values; and training a deep learning model by utilizing the sample image set and the multi-label classification value to obtain a multi-label classification model, wherein the multi-label classification model is used for determining a multi-label classification result of the image to be classified.

Description

Model training method, image multi-label classification method and device and electronic equipment
Technical Field
The present disclosure relates to the field of artificial intelligence technology, and more particularly, to the field of computer vision and deep learning technology. In particular to a model training method, an image multi-label classification device and electronic equipment.
Background
The image classification may refer to a process of assigning a category label to an image from a plurality of predetermined category labels. The image classification may include a single label classification and a multi-label classification. A single label classification may refer to one category label per image. Multi-label classification may refer to multiple category labels per image.
With the development of artificial intelligence technology, the artificial intelligence technology has been widely used in various fields. For example, the image may be multi-labeled classified using artificial intelligence techniques.
Disclosure of Invention
The disclosure provides a model training method, an image multi-label classification device and electronic equipment.
According to an aspect of the present disclosure, there is provided a training method of a deep learning model, including: acquiring a sample image set, wherein the sample image set comprises a plurality of abnormal sample images and a plurality of normal sample images, and the abnormal sample images and the normal sample images respectively have multi-label classification values; and training a deep learning model by using the sample image set and the multi-label classification value to obtain a multi-label classification model, wherein the multi-label classification model is used for determining a multi-label classification result of the image to be classified.
According to another aspect of the present disclosure, there is provided an image multi-label classification method, including: acquiring an image to be classified; and inputting the image to be classified into a multi-label classification model to obtain a multi-label classification result of the image to be classified, wherein the multi-label classification model is trained by using a training method of a deep learning model according to the method disclosed by the embodiment of the disclosure.
According to another aspect of the present disclosure, there is provided a training apparatus for a deep learning model, including: the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a sample image set, the sample image set comprises a plurality of abnormal sample images and a plurality of normal sample images, and the abnormal sample images and the normal sample images respectively have multi-label classification values; and the first training module is used for training a deep learning model by using the sample image set and the multi-label classification value to obtain a multi-label classification model, wherein the multi-label classification model is used for determining a multi-label classification result of the image to be classified.
According to another aspect of the present disclosure, there is provided an image multi-label classification apparatus including: the second acquisition module is used for acquiring the image to be classified; and the classification module is used for inputting the image to be classified into a multi-label classification model to obtain a multi-label classification result of the image to be classified, wherein the multi-label classification model is trained by utilizing a training device of a deep learning model according to the method disclosed by the embodiment of the disclosure.
According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform the method of the present disclosure.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method according to the present disclosure.
According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the method as described in the present disclosure.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
fig. 1 schematically illustrates an exemplary system architecture of a training method, an image multi-label classification method and apparatus that can deeply learn a model according to an embodiment of the present disclosure;
FIG. 2 schematically illustrates a flow diagram of a method of training a deep learning model according to an embodiment of the disclosure;
FIG. 3A schematically illustrates an example schematic of a solid color image according to an embodiment of this disclosure;
FIG. 3B schematically illustrates an example schematic of a black-edge image according to an embodiment of the disclosure;
FIG. 3C schematically illustrates an example schematic diagram of a lace image, according to an embodiment of the disclosure;
FIG. 3D schematically illustrates an example schematic of a screen inversion image, in accordance with an embodiment of the disclosure;
FIG. 3E schematically illustrates an example schematic diagram of a truncated image of an object, according to an embodiment of the disclosure;
FIG. 3F schematically illustrates an example schematic of an object disturbed image according to an embodiment of the disclosure;
FIG. 4 schematically illustrates an example schematic diagram of a process of acquiring a sample image set, in accordance with an embodiment of the disclosure;
FIG. 5 schematically illustrates an example schematic of a training process for a deep learning model according to an embodiment of this disclosure;
FIG. 6 schematically shows a flow chart of a method of image multi-label classification according to an embodiment of the present disclosure;
FIG. 7 schematically illustrates a block diagram of a training apparatus for deep learning models, in accordance with an embodiment of the present disclosure;
fig. 8 schematically shows a block diagram of an image multi-label classification apparatus according to an embodiment of the present disclosure; and
fig. 9 schematically illustrates a block diagram of an electronic device adapted to implement a training method of a deep learning model and an image multi-label classification method according to an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The user may obtain the target resource from the network by searching. The target asset may comprise a multimedia asset. The multimedia assets may include at least one of: video resources, audio resources, and text resources. Since the number of multimedia resources is large, the difficulty of searching is increased. In order to effectively meet and improve the search experience of different users, resources with higher quality need to be provided for the users. For example, for video resources, the user's experience can be improved by providing the user with a high-quality cover image, thereby reducing the search difficulty.
However, due to the fact that the magnitude of video resources is large, abnormal images exist in the cover image, poor user experience can be caused by the abnormal images, and searching difficulty is increased. For this reason, it is necessary to identify an abnormal image.
In the related art, the types of the coverage abnormal image are less for the abnormal image recognition. For a pure color image, a traditional pixel identification mode is used, and the precision of the mode is not high. The detection method is mostly adopted for identifying the black edge image and the lace image, and the method has high cost.
Therefore, the embodiment of the disclosure provides a training method of a deep learning model. The deep learning model is trained by utilizing the sample image set comprising the normal sample images and the abnormal sample images of various categories and the multi-label classification values of the sample images in the sample image set, so that the multi-label classification model capable of being used for determining the multi-label classification result of the image to be classified is obtained, the abnormal images of various categories are identified by utilizing a single model, the deployment cost of the model is reduced, and the calculation resources are saved.
In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.
In the technical scheme of the disclosure, before the personal information of the user is acquired or collected, the authorization or the consent of the user is acquired.
Fig. 1 schematically illustrates an exemplary system architecture of a training method, an image multi-label classification method and an apparatus to which a deep learning model may be applied, according to an embodiment of the present disclosure.
It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios. For example, in another embodiment, an exemplary system architecture to which the training method of the deep learning model, the image multi-label classification method, and the apparatus may be applied may include a terminal device, but the terminal device may implement the training method of the deep learning model, the image multi-label classification method, and the apparatus provided in the embodiments of the present disclosure without interacting with a server.
As shown in fig. 1, the system architecture 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104 and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired and/or wireless communication links, and so forth.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have installed thereon various communication client applications, such as a knowledge reading application, a web browser application, a search application, an instant messaging tool, a mailbox client, and/or social platform software, etc. (by way of example only).
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 105 may be various types of servers that provide various services. For example, the Server 105 may be a cloud Server, which is also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service extensibility in a conventional physical host and a VPS (Virtual Private Server, VPS). Server 105 may also be a server of a distributed system or a server that incorporates a blockchain.
It should be noted that the training method of the deep learning model provided by the embodiment of the present disclosure may be generally performed by the server 105. Accordingly, the training device for the deep learning model provided by the embodiment of the present disclosure may be generally disposed in the server 105. The training method of the deep learning model provided by the embodiment of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the training device for the deep learning model provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster different from the server 105 and capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.
Alternatively, the training method of the deep learning model provided by the embodiment of the present disclosure may also be generally performed by the terminal device 101, 102, or 103. Correspondingly, the training device for the deep learning model provided by the embodiment of the disclosure can also be arranged in the terminal device 101, 102, or 103.
It should be noted that the image multi-label classification method provided by the embodiment of the present disclosure may be generally executed by the terminal device 101, 102, or 103. Correspondingly, the reminding multi-label classification device provided by the embodiment of the disclosure can also be arranged in the terminal equipment 101, 102 or 103.
Alternatively, the image multi-label classification method provided by the embodiments of the present disclosure may also be generally performed by the server 105. Accordingly, the image multi-label classification apparatus provided by the embodiment of the present disclosure may be generally disposed in the server 105. The image multi-label classification method provided by the embodiment of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the image multi-label classification apparatus provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster different from the server 105 and capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
It should be noted that the sequence numbers of the respective operations in the following methods are merely used as representations of the operations for description, and should not be construed as representing the execution order of the respective operations. The method need not be performed in the exact order shown, unless explicitly stated.
Fig. 2 schematically shows a flow chart of a training method of a deep learning model according to an embodiment of the present disclosure.
As shown in FIG. 2, the method 200 includes operations S210-S220.
In operation S210, a sample image set is acquired. The sample image set includes a plurality of abnormal sample images and a plurality of normal sample images. The abnormal sample image and the normal sample image each have a multi-label classification value.
In operation S220, a deep learning model is trained using the sample image set and the multi-label classification value, so as to obtain a multi-label classification model. The multi-label classification model is used for determining a multi-label classification result of the image to be classified.
According to an embodiment of the present disclosure, the sample image set may include a plurality of abnormal sample images and a plurality of normal sample images. Each abnormal sample image of the plurality of abnormal sample images may have a respective multi-label classification value. Each of the plurality of normal sample images may have a respective multi-label classification value.
According to embodiments of the present disclosure, a multi-label classification value may be used to characterize a label classification value of each sample image for each of a plurality of predetermined classes, which may be used to characterize a score of each sample image belonging to each predetermined class.
According to an embodiment of the present disclosure, the predetermined categories of the plurality of abnormal sample images may include at least two of: a solid image, a black-edged image, a lace image, a screen-reversed image, an object-truncated image, and an object-disturbed image. In this case, the multi-label classification value of the abnormal sample image may include at least two of: a solid image label classification value, a black edge image label classification value, a lace image label classification value, a picture inversion image label classification value, an object truncated image label classification value, and an object disturbed image label classification value.
According to the embodiment of the disclosure, the plurality of abnormal sample images and the plurality of normal sample images may be acquired by real-time acquisition, may be stored in a database in advance, or may be received from other terminal devices. The embodiment of the present disclosure does not limit the manner of obtaining the sample image set.
According to the embodiment of the disclosure, before training the deep learning model by using the sample image set and the multi-label classification value, a plurality of abnormal sample images and a plurality of normal sample images in the sample image set can be respectively subjected to preprocessing operation. The preprocessing operations may include image denoising, image enhancement, data augmentation, and the like.
According to the embodiment of the disclosure, before the deep learning model is trained by using the sample image set and the multi-label classification value, feature extraction can be further performed on a plurality of abnormal sample images and a plurality of normal sample images in the sample image set respectively to obtain feature data of each abnormal sample image in the plurality of abnormal sample images and feature data of each normal sample image in the plurality of normal sample images. Feature extraction may be performed based on a point of interest detection approach or using a dense extraction approach. The interest point detection may be performed by at least one of Harris corner detector, log (laplacian of gaussian) operator, dog (difference of gaussian) operator, and the like. The dense extraction may obtain at least one of Scale-invariant feature transform (SIFT), Histogram of Oriented Gradient (HOG), Local Binary Pattern (LBP), and other features.
According to the embodiment of the disclosure, a deep learning model can be trained by utilizing a sample image set and multi-label classification values, so as to obtain a multi-label classification model for determining a multi-label classification result of an image to be classified. The model structure of the deep learning model can be configured according to actual business requirements, and is not limited herein. For example, the deep learning model may include a feature extraction module and a classification module. The feature extraction module may include at least one of: convolutional neural network models and Transformer (i.e., Transformer) based encoders. The classifier may include one of: support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF) and Gradient Boosting (GB) models, etc.
According to the embodiment of the disclosure, the deep learning model is trained by using the sample image set comprising the normal sample images and the abnormal sample images of various categories and the multi-label classification values of the sample images in the sample image set, so that the multi-label classification model capable of being used for determining the multi-label classification result of the image to be classified is obtained, the abnormal images of various categories are identified by using a single model, the deployment cost of the model is reduced, and the calculation resources are saved.
The above is only an exemplary embodiment, but is not limited thereto, and other training methods of the deep learning model known in the art may be included as long as the deep learning model can be trained.
The training method of the deep learning model according to the embodiment of the disclosure is further described below with reference to fig. 3A, 3B, 3C, 3D, 3E, 3F, 4 and 5.
According to an embodiment of the present disclosure, the categories of the plurality of abnormal sample images include at least two of: a solid image, a black-edged image, a lace image, a screen-reversed image, an object-truncated image, and an object-disturbed image.
Fig. 3A schematically illustrates an example schematic of a solid color image according to an embodiment of the disclosure.
As shown in fig. 3A, in the case where a certain region in the sample image is an image of a single color, it can be determined that the sample image belongs to a solid-color image. For example, RGB of a certain region in the sample image may be (191, 191, 191), and then it may be determined that the category of the sample image includes a pure color image. The embodiments of the present disclosure do not limit the specific shape, color, content, or the like of the solid-color image.
Fig. 3B schematically illustrates an example schematic diagram of a black-edge image according to an embodiment of the disclosure.
As shown in fig. 3B, in the case where the outer side of the sample image is wrapped by the black frame line, it may be determined that the category of the sample image includes the black-border image. The embodiment of the present disclosure does not limit the specific shape, color, content, or the like of the black-edge image.
Fig. 3C schematically illustrates an example schematic diagram of a lace image according to an embodiment of the disclosure.
As shown in fig. 3C, in the case where the outer contour of the sample image is affected, it may be determined that the category of the sample image includes a lace image. The embodiments of the present disclosure do not limit the specific shape, color, content, or the like of the lace image.
Fig. 3D schematically illustrates an example schematic of a screen inversion image according to an embodiment of the disclosure.
As shown in fig. 3D, in a case where the target object in the sample image is rotated, i.e., not in the right position, it may be determined that the category of the sample image includes the screen-reversed image. The embodiments of the present disclosure do not limit the specific shape, color, content, or the like of the screen-inverted image.
Fig. 3E schematically illustrates an example schematic diagram of an object truncated image, according to an embodiment of the disclosure.
As shown in fig. 3E, in the case where the target object in the sample image is not complete, it may be determined that the category of the sample image includes an object truncated image. The embodiment of the present disclosure does not limit the specific shape, color, content, or the like of the truncated image of the object.
Fig. 3F schematically illustrates an example schematic diagram of an object disturbed image according to an embodiment of the disclosure.
As shown in fig. 3F, in the case where the target object in the sample image is occluded by other objects, it may be determined that the category of the sample image includes the object-disturbed image. The embodiments of the present disclosure do not limit the specific shape, color, or content of the disturbed image of the object.
According to an embodiment of the present disclosure, operation S210 may include the following operations.
A plurality of first abnormal sample images and a plurality of normal sample images are acquired. A plurality of target images are determined from the plurality of normal sample images. And generating a second abnormal sample image corresponding to the target image according to the plurality of target images. The second abnormal sample image has a predetermined multi-label classification value. And obtaining a sample image set according to the plurality of abnormal sample images and the plurality of normal sample images. The plurality of abnormal sample images includes a plurality of first abnormal sample images and a plurality of second abnormal sample images.
According to an embodiment of the present disclosure, the plurality of first anomaly sample images includes at least one of: at least one real anomaly sample image and at least one simulated anomaly sample image. The real abnormal sample image is acquired from the real image based on a predetermined search object. The simulated abnormal sample image is generated based on one of the following modes: generated based on predetermined image parameters and generated based on generating a competing network model to process predetermined random noise data.
According to an embodiment of the present disclosure, the real abnormal sample image may include a black border image, a lace image, an object truncated image, an object disturbed image, and the like acquired from the real image based on a predetermined search object. The predetermined search object may include, for example, at least one of: humans, animals, and articles, among others. For example, a search engine may be used to collect and download solid images of different colors, etc. based on keywords.
According to the embodiment of the disclosure, the simulated abnormal sample image can be artificially generated by using a program based on the preset image parameters so as to expand the coverage of the abnormal sample image. For example, the predetermined image parameter may be set to RGB (0, 0, 0), in which case a black analog abnormal sample image may be obtained.
According to the embodiment of the present disclosure, by inputting predetermined random noise data to generate a countermeasure network model, a simulated abnormal sample image can be obtained. Generating the antagonistic network model can include deep convolution generating an antagonistic network model, generating an antagonistic network model based on bulldozer distance, or conditionally generating an antagonistic network model, among others. Generating the antagonistic network model can include a generator and an arbiter. The generator and the arbiter may comprise a neural network model. The generator can be used for generating a simulated abnormal sample image, and learning the data distribution of the first abnormal sample image by continuously training the generator, so that samples which are consistent with the data distribution of the first abnormal sample image can be generated from the beginning to the end, and the discriminator is defrobulated as far as possible. The discriminator may be used to distinguish between the simulated abnormal sample image and the real abnormal sample image.
According to an embodiment of the present disclosure, generating the convergence condition of the antagonistic network model may include the generator converging, the generator and the arbiter both converging, or the iteration reaching the termination condition may include the number of iterations being equal to the predetermined number of iterations.
According to the embodiment of the disclosure, the proportion of the real abnormal sample image and the simulated abnormal sample image can be adjusted according to the actual situation, and the proportion of the real abnormal sample image and the simulated abnormal sample image is not limited by the embodiment of the disclosure.
According to the embodiment of the disclosure, a plurality of target images can be determined from a plurality of normal sample images, and the plurality of target images are respectively processed to obtain a second abnormal sample image corresponding to the target image. The predetermined multi-label classification value may include a picture-reversed image and an object-truncated image. For example, rotation processing of a predetermined angle may be performed on each of the plurality of target images to obtain screen inversion images. The predetermined angle may include, for example, 90 degrees, 180 degrees, 270 degrees, or the like. Alternatively, a plurality of target images may be subjected to a cropping process, respectively, to obtain truncated images of the object.
According to embodiments of the present disclosure, the screen-reversed image and the object-truncated image may include two types of negative examples: first, normal sample images independent of the picture-reversed image and the subject truncated image content. Second, a normal sample image related to the screen inversion image and the contents of the subject truncated image, for example, the screen inversion image may be an image obtained by subjecting the subject image to a truncation process, and the screen inversion image may be an image obtained by randomly rotating the subject image.
According to an embodiment of the present disclosure, the normal sample image, the solid image, the black border image, the lace image, the screen inversion image, the object truncated image, and the object disturbed image may have different labels. For example, the label of a solid color image may be characterized as a. The label of the black-edged image can be characterized as B. The label of the lace image may be characterized as C. The label of the picture-upside down image may be characterized as D. The label of the truncated image of the object may be characterized as E. The label of the disturbed image of the object may be characterized as F. The label of a normal sample image can be characterized as T.
For example, in the case where the category label corresponding to the multi-label classification model includes a solid image and a screen-inverted image, the type corresponding to a certain abnormal sample image determined by the multi-label classification model may be gray and the screen rotated 90 degrees clockwise.
According to the embodiment of the disclosure, the second abnormal sample image with the preset multi-label classification value corresponding to the target image is generated according to the plurality of target images in the normal sample image, so that the multi-label classification model obtained by training the sample image set and the multi-label classification value can accurately distinguish images of the preset category, and the prediction accuracy of the multi-label classification model is improved.
According to an embodiment of the present disclosure, operation S220 may include the following operations.
And inputting the abnormal sample images in the plurality of abnormal sample images into the deep learning model to obtain a multi-label classification result of the abnormal sample images. And inputting the normal sample images in the plurality of normal sample images into the deep learning model to obtain the multi-label classification result of the normal sample images. And obtaining a loss function value according to the respective multi-label classification result of the plurality of abnormal sample images, the respective multi-label classification value of the plurality of abnormal sample images, the respective multi-label classification result of the plurality of normal sample images and the respective multi-label classification value of the plurality of normal sample images. And adjusting the model parameters of the deep learning model according to the loss function values until a preset end condition is met, so as to obtain the multi-label classification model.
According to the embodiment of the disclosure, the abnormal sample images in the plurality of abnormal sample images can be input to the deep learning model, and the multi-label classification result of the abnormal sample images is obtained. And inputting the normal sample images in the plurality of normal sample images into the deep learning model to obtain the multi-label classification result of the normal sample images. And obtaining a loss function value by utilizing the respective multi-label classification value and multi-label classification result of the plurality of abnormal sample images and the respective multi-label classification value and multi-label classification result of the plurality of normal sample images based on the multi-label classification loss function. Model parameters of the deep learning model can be adjusted according to the loss function values until a predetermined condition is met.
For example, the model parameters of the deep learning model may be adjusted according to a back propagation algorithm or a stochastic gradient descent algorithm until a predetermined condition is satisfied. And determining the deep learning model obtained under the condition that a preset condition is met as the multi-label classification model. The predetermined condition may include at least one of convergence of the loss function value and a training round reaching a maximum training round.
According to embodiments of the present disclosure, the multi-label classification loss function may include at least one of: cross Entropy Loss function (Cross Entropy Loss), Hinge Loss function (Hinge Loss), or Exponential Loss function (Exponential Loss).
According to the embodiment of the disclosure, the model parameters of the deep learning model are adjusted according to the loss function values obtained according to the multi-label classification results and the multi-label classification values of the abnormal sample images and the multi-label classification results and the multi-label classification values of the normal sample images, so that the image multi-label classification accuracy of the multi-label classification model is improved.
According to an embodiment of the present disclosure, the training method 200 of the deep learning model may further include repeatedly performing the following operations until the performance test result of the multi-label classification model satisfies the predetermined performance condition.
And testing the model performance of the multi-label classification model by using the verification image set to obtain a performance test result. And under the condition that the performance test result is determined not to meet the preset performance condition, adjusting the model hyper-parameter corresponding to the multi-label classification model. And retraining the deep learning model by utilizing the sample image set and the multi-label classification value based on the adjusted model hyper-parameter to obtain a new multi-label classification model.
According to the embodiment of the disclosure, a plurality of images can be acquired, an image set is obtained, and the images in the image set are divided into a training image set, a test image set and a verification image set according to a preset proportion. The training image set and the test image set can be used for training of the deep learning model, and the verification image set can be used for verification of the multi-label classification model, so that overfitting of the model in the training process is prevented. The proportion of the number of the sample images in the training image set, the test image set and the verification image set can be configured according to business requirements, and is not limited herein. For example, the ratio may be 8: 1.
According to the embodiment of the disclosure, preprocessing operation can be performed on the images in the training image set and the test image set to remove redundant information in the images and enhance the detectability of related information. The pre-processing operation may include at least one of graying, geometric transformation, data enhancement, and the like. The graying may include at least one of a component method, a maximum method, an average method, and a weighted average method. The geometric transformation may include at least one of translation, transposition, mirroring, rotation, scaling, and the like. Data enhancement can improve the visual effect of an image to meet different needs for the image. For example, the data enhancement operation may be performed by performing at least one of random cropping, flipping, color space transformation, and the like on the images in the training image set and the test image set. Furthermore, the resolution of the data-enhanced image may also be scaled to a predetermined resolution.
According to an embodiment of the present disclosure, each image in the set of verification images may include a label classification value for each of a plurality of predetermined categories, which may be used to characterize a value of each sample image for each predetermined category.
According to embodiments of the present disclosure, model performance may be characterized by a model performance evaluation value. The model performance assessment value may include at least one of: precision rate, recall rate, accuracy rate, error rate, and F function value. The predetermined performance condition may refer to the performance evaluation value being greater than or equal to a predetermined performance evaluation threshold. The predetermined performance evaluation threshold may be configured according to actual traffic demands, and is not limited herein. The model hyper-parameters may include at least one of: the learning rate and the number of layers of the deep learning model, etc.
According to the embodiment of the disclosure, the model performance of the multi-label classification model is tested by using the verification image set, and a performance test result is obtained. For example, the verification image set may be processed using a multi-label classification model to obtain a multi-label classification result. And determining a performance test result according to the multi-label classification result. It is determined whether the performance test results satisfy a predetermined performance condition. And under the condition that the performance test result is determined to meet the preset performance condition, finishing the training operation of the multi-label classification model. In the event that the performance test result is determined not to satisfy the predetermined performance condition, the model hyper-parameters corresponding to the multi-label classification model may be adjusted. And retraining the deep learning model by utilizing the sample image set and the multi-label classification value based on the adjusted model hyper-parameter to obtain a new multi-label classification model. And repeatedly executing the operation until the performance test result meets the preset performance condition.
According to an embodiment of the present disclosure, the training method 200 of the deep learning model may further include the following operations.
And determining respective multi-label classification thresholds of the multiple categories according to the performance test result meeting the preset performance condition so as to perform multi-label classification by using the multi-label classification thresholds.
According to the embodiment of the present disclosure, the multi-label classification threshold value of each of the plurality of classes may be determined according to the model performance evaluation value of the deep learning model. The model performance assessment value may include at least one of: precision rate, recall rate, accuracy rate, error rate, and F function value. For example, recall and accuracy rates for each of a plurality of categories at respective multi-label classification thresholds may be determined to determine respective multi-label classification thresholds for the plurality of categories. For example, the multi-label classification threshold for a solid color image labeled a may be 0.8. The multi-label classification threshold for a black-edge image labeled B may be 0.7, and so on. The embodiments of the present disclosure are not limited thereto.
According to the embodiment of the disclosure, the model performance of the multi-label classification model is tested by using the verification image set until the performance test result of the multi-label classification model meets the preset performance condition, and the multi-label classification threshold of each of multiple categories is determined according to the performance test result meeting the preset performance condition, so that the accuracy of the multi-label classification result obtained by processing the image to be classified through the multi-label classification model is improved.
FIG. 4 schematically illustrates an example schematic diagram of a process of acquiring a sample image set, in accordance with an embodiment of the disclosure.
As shown in fig. 4, a plurality of first abnormal sample images 401 and a plurality of normal sample images 402 may be acquired. The plurality of first abnormality sample images 401 may include at least one real abnormality sample image 401_1 and/or at least one simulated abnormality sample image 401_ 2.
The real abnormal sample image 401_1 may be acquired from the real image 401_11 based on a predetermined search object. The simulated abnormality sample image 401_2 may be generated based on predetermined image parameters 401_ 21. Alternatively, the predetermined random noise data 401_22 may be input to the generation countermeasure network model 401_23, and the predetermined random noise data 401_22 is processed via the generation countermeasure network model 401_23, generating the simulated abnormality sample image 401_ 2.
A plurality of target images 403 may be determined from the plurality of normal sample images 402. From the plurality of target images 403, a second abnormal sample image 404 having a predetermined multi-label classification value corresponding to the target image 403 is generated.
A sample image set 405 may be obtained from the plurality of first abnormal sample images 401, the plurality of second abnormal sample images 404, and the plurality of normal sample images 402.
Fig. 5 schematically illustrates an example schematic of a training process of a deep learning model according to an embodiment of the disclosure.
As shown in fig. 5, each abnormal sample image 501 in the plurality of abnormal sample images may be input to the deep learning model 505, and a multi-label classification result 506 of the abnormal sample image of each abnormal sample image may be obtained.
Each normal sample image 502 of the plurality of normal sample images may be input to the deep learning model 505, respectively, to obtain a multi-label classification result 507 of the respective normal sample image of each normal sample image.
The multi-label classification value 503 of the abnormal sample image, the multi-label classification result 506 of the abnormal sample image, the multi-label classification value 504 of the normal sample image, and the multi-label classification result 507 of the normal sample image may be input to the multi-label classification loss function 508, and a loss function value 509 may be output. The model parameters of the deep learning model 505 may be adjusted according to the loss function values 509 until a predetermined condition is satisfied.
Fig. 6 schematically shows a flow chart of an image multi-label classification method according to an embodiment of the present disclosure.
As shown in fig. 6, the image multi-label classification method 600 includes operations S610 to S620.
In operation S610, an image to be classified is acquired.
In operation S620, the image to be classified is input into the multi-label classification model, so as to obtain a multi-label classification result of the image to be classified.
According to the embodiment of the disclosure, the multi-label classification model is obtained by training according to the training method of the deep learning model in the embodiment of the disclosure.
According to the embodiment of the disclosure, the image to be classified may be acquired by real-time acquisition, may be pre-stored in a database, or may be received from other terminal devices. The embodiment of the present disclosure does not limit the acquisition mode of the image to be classified.
According to the embodiment of the disclosure, after the image to be classified is obtained, the image to be classified can be scaled to the image with the preset resolution, and the scaled image can be normalized to obtain the processed image. The normalization process may include coordinate centering, x-sharpening normalization, scaling normalization, and rotation normalization. The predetermined resolution may be 224 × 224, for example, and the embodiment of the present disclosure does not limit a specific value of the predetermined resolution.
According to the embodiment of the disclosure, after the processing image is obtained, the processing image can be input into the multi-label classification model obtained by the deep learning model training method, and a multi-label classification result of the image to be classified is obtained. The multi-label classification result can be used for representing classification probability values of the image to be classified belonging to different predetermined classes.
According to an embodiment of the present disclosure, for a certain predetermined category, in the case that the multi-label classification result corresponding to the predetermined category is greater than the multi-label classification threshold, it may be determined that the image to be classified belongs to the predetermined category.
For example, the multi-label classification threshold of the solid color image corresponding to the label a is 0.8, and the multi-label classification result of the solid color image corresponding to the label a is 0.85. The multi-label classification threshold of the black-edge image corresponding to the label B may be 0.7, and the multi-label classification result of the black-edge image corresponding to the label B may be 0.75. In this case, it may be determined that the multi-label classification result of the image to be classified belongs to the solid image and the black image.
According to an embodiment of the present disclosure, a multi-label classification result of an image to be classified includes classification probability values corresponding to respective classes.
The above is only an exemplary embodiment, but is not limited thereto, and other image multi-label classification methods known in the art may be included as long as the image multi-label classification can be achieved.
Fig. 7 schematically shows a block diagram of a training apparatus for a deep learning model according to an embodiment of the present disclosure.
As shown in fig. 7, the training apparatus 700 for deep learning model may include a first obtaining module 710 and a first training module 720.
A first acquisition module 710 for acquiring a sample image set. The sample image set includes a plurality of abnormal sample images and a plurality of normal sample images. The abnormal sample image and the normal sample image each have a multi-label classification value.
The first training module 720 is configured to train the deep learning model with the sample image set and the multi-label classification value to obtain a multi-label classification model. The multi-label classification model is used for determining a multi-label classification result of the image to be classified.
According to an embodiment of the present disclosure, the first obtaining module 710 may include an obtaining unit, a determining unit, a generating unit, and a first obtaining unit.
An acquisition unit configured to acquire a plurality of first abnormal sample images and a plurality of normal sample images.
A determining unit for determining a plurality of target images from the plurality of normal sample images.
And a generating unit configured to generate a second abnormal sample image corresponding to the target image from the plurality of target images. The second abnormal sample image has a predetermined multi-label classification value.
The first obtaining unit is used for obtaining a sample image set according to the abnormal sample images and the normal sample images. The plurality of abnormal sample images includes a plurality of first abnormal sample images and a plurality of second abnormal sample images.
According to an embodiment of the present disclosure, the plurality of first anomaly sample images may include at least one of: at least one real anomaly sample image and at least one simulated anomaly sample image. The real abnormal sample image may be acquired from the real image based on a predetermined search object. The simulated abnormal sample image may be generated based on one of the following: generated based on predetermined image parameters and generated based on generating a competing network model to process predetermined random noise data.
According to an embodiment of the present disclosure, the first training module 720 may include a second obtaining unit, a third obtaining unit, a fourth obtaining unit, and an adjusting unit.
And the second obtaining unit is used for inputting the abnormal sample images in the plurality of abnormal sample images into the deep learning model to obtain the multi-label classification result of the abnormal sample images.
And the third obtaining unit is used for inputting the normal sample images in the plurality of normal sample images into the deep learning model to obtain the multi-label classification result of the normal sample images.
A fourth obtaining unit, configured to obtain a loss function value according to a multi-label classification result of each of the plurality of abnormal sample images, a multi-label classification value of each of the plurality of abnormal sample images, a multi-label classification result of each of the plurality of normal sample images, and a multi-label classification value of each of the plurality of normal sample images.
And the adjusting unit is used for adjusting the model parameters of the deep learning model according to the loss function value until a preset ending condition is met, so that the multi-label classification model is obtained.
According to an embodiment of the present disclosure, the categories of the plurality of abnormal sample images may include at least two of: a solid image, a black-edged image, a lace image, a screen-reversed image, an object-truncated image, and an object-disturbed image.
According to an embodiment of the present disclosure, the training apparatus 700 for deep learning model may further include a testing module, an adjusting module, and a second training module, configured to repeatedly perform the following operations until the performance test result of the multi-label classification model meets a predetermined performance condition.
And the test module is used for testing the model performance of the multi-label classification model by using the verification image set to obtain a performance test result.
And the adjusting module is used for adjusting the model hyper-parameters corresponding to the multi-label classification model under the condition that the performance test result is determined not to meet the preset performance condition.
And the second training module is used for retraining the deep learning model by utilizing the sample image set and the multi-label classification value based on the adjusted model hyper-parameter to obtain a new multi-label classification model.
According to an embodiment of the present disclosure, the training apparatus 700 for deep learning model may further include a determination module.
And the determining module is used for determining the multi-label classification threshold of each of the multiple categories according to the performance test result meeting the preset performance condition so as to perform multi-label classification by using the multi-label classification threshold.
Fig. 8 schematically shows a block diagram of an image multi-label classification apparatus according to an embodiment of the present disclosure.
As shown in fig. 8, the image multi-label classification apparatus 800 may include a second acquisition module 810 and a classification module 820.
And a second obtaining module 810, configured to obtain an image to be classified.
The classification module 820 is configured to input the image to be classified into the multi-label classification model, so as to obtain a multi-label classification result of the image to be classified.
According to an embodiment of the present disclosure, the multi-label classification model may be obtained by training according to a training device of the deep learning model described in the embodiment of the present disclosure.
According to an embodiment of the present disclosure, the multi-label classification result of the image to be classified may include classification probability values corresponding to the respective plurality of classes.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
According to an embodiment of the present disclosure, an electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described in the present disclosure.
According to an embodiment of the present disclosure, a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform a method as described in the present disclosure.
According to an embodiment of the disclosure, a computer program product comprising a computer program which, when executed by a processor, implements a method as described in the disclosure.
Fig. 9 schematically illustrates a block diagram of an electronic device adapted to implement a training method of a deep learning model and an image multi-label classification method according to an embodiment of the present disclosure.
A schematic block diagram of an example electronic device 900 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processors, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 9, the electronic device 900 includes a computing unit 901 that can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data necessary for the operation of the electronic apparatus 900 can be stored. The calculation unit 901, ROM 902, and RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.
A number of components in the electronic device 900 are connected to the I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, and the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, optical disk, or the like; and a communication unit 909 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 909 allows the electronic device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.
The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 901 performs the respective methods and processes described above, such as the training method of the deep learning model and the image multi-label classification method. For example, in some embodiments, the training method of the deep learning model and the image multi-label classification method may be implemented as computer software programs that are tangibly embodied on a machine-readable medium, such as the storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 900 via the ROM 902 and/or the communication unit 909. When the computer program is loaded into the RAM 903 and executed by the computing unit 901, one or more steps of the training method of the deep learning model and the image multi-label classification method described above may be performed. Alternatively, in other embodiments, the computing unit 901 may be configured to perform the training method of the deep learning model and the image multi-label classification method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (21)

1. A training method of a deep learning model comprises the following steps:
obtaining a sample image set, wherein the sample image set comprises a plurality of abnormal sample images and a plurality of normal sample images, and the abnormal sample images and the normal sample images respectively have multi-label classification values; and
and training a deep learning model by using the sample image set and the multi-label classification value to obtain a multi-label classification model, wherein the multi-label classification model is used for determining a multi-label classification result of the image to be classified.
2. The method of claim 1, wherein said acquiring a sample image set comprises:
acquiring a plurality of first abnormal sample images and a plurality of normal sample images;
determining a plurality of target images from the plurality of normal sample images;
generating a second abnormal sample image corresponding to the target image according to the plurality of target images, wherein the second abnormal sample image has a preset multi-label classification value; and
obtaining the sample image set according to the plurality of abnormal sample images and the plurality of normal sample images, wherein the plurality of abnormal sample images comprise the plurality of first abnormal sample images and the plurality of second abnormal sample images.
3. The method of claim 2, wherein the plurality of first anomaly sample images includes at least one of: at least one real abnormal sample image and at least one simulated abnormal sample image;
the real abnormal sample image is obtained from a real image based on a predetermined search object;
the simulated abnormal sample image is generated based on one of the following modes: generated based on predetermined image parameters and generated based on generating a competing network model to process predetermined random noise data.
4. The method according to any one of claims 1 to 3, wherein training a deep learning model using the sample image set and the multi-label classification values to obtain a multi-label classification model comprises:
inputting abnormal sample images in the plurality of abnormal sample images into the deep learning model to obtain a multi-label classification result of the abnormal sample images;
inputting normal sample images in the plurality of normal sample images into the deep learning model to obtain multi-label classification results of the normal sample images;
obtaining a loss function value according to the respective multi-label classification results of the plurality of abnormal sample images, the respective multi-label classification values of the plurality of abnormal sample images, the respective multi-label classification results of the plurality of normal sample images and the respective multi-label classification values of the plurality of normal sample images;
and adjusting the model parameters of the deep learning model according to the loss function values until a preset end condition is met, so as to obtain the multi-label classification model.
5. The method of any of claims 1-4, wherein the categories of the plurality of abnormal sample images include at least two of: a solid image, a black-edged image, a lace image, a screen-reversed image, an object-truncated image, and an object-disturbed image.
6. The method of any of claims 1-5, further comprising repeatedly performing the following operations until the performance test results of the multi-label classification model satisfy a predetermined performance condition:
testing the model performance of the multi-label classification model by using a verification image set to obtain a performance test result;
adjusting a model hyper-parameter corresponding to the multi-label classification model under the condition that the performance test result is determined not to meet the preset performance condition; and
and retraining the deep learning model by using the sample image set and the multi-label classification value based on the adjusted model hyper-parameter to obtain a new multi-label classification model.
7. The method of claim 6, further comprising:
and determining a multi-label classification threshold value of each of the plurality of categories according to a performance test result meeting the preset performance condition so as to perform multi-label classification by using the multi-label classification threshold value.
8. An image multi-label classification method comprises the following steps:
acquiring an image to be classified; and
inputting the image to be classified into a multi-label classification model to obtain a multi-label classification result of the image to be classified, wherein the multi-label classification model is trained by the method according to any one of claims 1-7.
9. The method of claim 8, wherein the multi-label classification result of the image to be classified comprises classification probability values corresponding to each of a plurality of classes.
10. A training apparatus for deep learning models, comprising:
the image processing device comprises a first obtaining module, a second obtaining module and a third obtaining module, wherein the first obtaining module is used for obtaining a sample image set, the sample image set comprises a plurality of abnormal sample images and a plurality of normal sample images, and the abnormal sample images and the normal sample images respectively have multi-label classification values; and
and the first training module is used for training a deep learning model by using the sample image set and the multi-label classification value to obtain a multi-label classification model, wherein the multi-label classification model is used for determining a multi-label classification result of the image to be classified.
11. The apparatus of claim 10, wherein the first obtaining means comprises:
an acquisition unit configured to acquire a plurality of first abnormal sample images and the plurality of normal sample images;
a determination unit configured to determine a plurality of target images from the plurality of normal sample images;
a generating unit configured to generate a second abnormal sample image corresponding to the target image from the plurality of target images, wherein the second abnormal sample image has a predetermined multi-label classification value; and
a first obtaining unit, configured to obtain the sample image set according to the plurality of abnormal sample images and the plurality of normal sample images, where the plurality of abnormal sample images include the plurality of first abnormal sample images and the plurality of second abnormal sample images.
12. The apparatus of claim 11, wherein the plurality of first anomaly sample images include at least one of: at least one real abnormal sample image and at least one simulated abnormal sample image;
the real abnormal sample image is acquired from a real image based on a predetermined search object;
the simulated abnormal sample image is generated based on one of the following modes: generated based on predetermined image parameters and generated based on generating a competing network model to process predetermined random noise data.
13. The apparatus of any of claims 10-12, wherein the first training module comprises:
a second obtaining unit, configured to input an abnormal sample image in the multiple abnormal sample images into the deep learning model, and obtain a multi-label classification result of the abnormal sample image;
a third obtaining unit, configured to input a normal sample image in the plurality of normal sample images into the deep learning model, so as to obtain a multi-label classification result of the normal sample image;
a fourth obtaining unit, configured to obtain a loss function value according to a multi-label classification result of each of the plurality of abnormal sample images, a multi-label classification value of each of the plurality of abnormal sample images, a multi-label classification result of each of the plurality of normal sample images, and a multi-label classification value of each of the plurality of normal sample images;
and the adjusting unit is used for adjusting the model parameters of the deep learning model according to the loss function value until a preset ending condition is met, so that the multi-label classification model is obtained.
14. The apparatus of any of claims 10-13, wherein the categories of the plurality of abnormal sample images include at least two of: a solid image, a black-edged image, a lace image, a screen-reversed image, an object-truncated image, and an object-disturbed image.
15. The apparatus according to any one of claims 10 to 14, further comprising a testing module, an adjusting module and a second training module, configured to repeatedly perform the following operations until the performance test result of the multi-label classification model satisfies a predetermined performance condition:
the test module is used for testing the model performance of the multi-label classification model by using a verification image set to obtain a performance test result;
the adjusting module is used for adjusting the model hyper-parameters corresponding to the multi-label classification model under the condition that the performance test result is determined not to meet the preset performance condition; and
and the second training module is used for retraining the deep learning model by utilizing the sample image set and the multi-label classification value based on the adjusted model hyper-parameter to obtain a new multi-label classification model.
16. The apparatus of claim 15, further comprising:
and the determining module is used for determining the multi-label classification threshold of each of the categories according to the performance test result meeting the preset performance condition so as to perform multi-label classification by using the multi-label classification threshold.
17. An image multi-label classification apparatus comprising:
the second acquisition module is used for acquiring the image to be classified; and
a classification module, configured to input the image to be classified into a multi-label classification model, so as to obtain a multi-label classification result of the image to be classified, where the multi-label classification model is trained by using the apparatus according to any one of claims 10 to 16.
18. The apparatus of claim 17, wherein the multi-label classification result of the image to be classified comprises classification probability values corresponding to each of a plurality of classes.
19. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein, the first and the second end of the pipe are connected with each other,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7 or 8-9.
20. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any of claims 1-7 or 8-9.
21. A computer program product comprising a computer program which, when executed by a processor, implements a method according to any one of claims 1 to 7 or claims 8 to 9.
CN202210817804.5A 2022-07-12 2022-07-12 Model training method, image multi-label classification method and device and electronic equipment Pending CN115049882A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210817804.5A CN115049882A (en) 2022-07-12 2022-07-12 Model training method, image multi-label classification method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210817804.5A CN115049882A (en) 2022-07-12 2022-07-12 Model training method, image multi-label classification method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN115049882A true CN115049882A (en) 2022-09-13

Family

ID=83164847

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210817804.5A Pending CN115049882A (en) 2022-07-12 2022-07-12 Model training method, image multi-label classification method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN115049882A (en)

Similar Documents

Publication Publication Date Title
CN105426356B (en) A kind of target information recognition methods and device
CN112966742A (en) Model training method, target detection method and device and electronic equipment
CN112949767B (en) Sample image increment, image detection model training and image detection method
CN111598164A (en) Method and device for identifying attribute of target object, electronic equipment and storage medium
EP3876201A1 (en) Object detection and candidate filtering system
CN113436100B (en) Method, apparatus, device, medium, and article for repairing video
CN113869449A (en) Model training method, image processing method, device, equipment and storage medium
CN113627439A (en) Text structuring method, processing device, electronic device and storage medium
CN114898266B (en) Training method, image processing device, electronic equipment and storage medium
CN114445826A (en) Visual question answering method and device, electronic equipment and storage medium
CN113643260A (en) Method, apparatus, device, medium and product for detecting image quality
CN113902899A (en) Training method, target detection method, device, electronic device and storage medium
CN114724144B (en) Text recognition method, training device, training equipment and training medium for model
CN116935368A (en) Deep learning model training method, text line detection method, device and equipment
CN115082598A (en) Text image generation method, text image training method, text image processing method and electronic equipment
CN115116080A (en) Table analysis method and device, electronic equipment and storage medium
CN115049882A (en) Model training method, image multi-label classification method and device and electronic equipment
CN114612971A (en) Face detection method, model training method, electronic device, and program product
CN114093006A (en) Training method, device and equipment of living human face detection model and storage medium
CN113887630A (en) Image classification method and device, electronic equipment and storage medium
CN113989720A (en) Target detection method, training method, device, electronic equipment and storage medium
CN113139486A (en) Method, apparatus, device and storage medium for processing image
CN111815658A (en) Image identification method and device
CN114329475B (en) Training method, device and equipment for malicious code detection model
CN114037865B (en) Image processing method, apparatus, device, storage medium, and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination