CN115482415A - Model training method, image classification method and device - Google Patents

Model training method, image classification method and device Download PDF

Info

Publication number
CN115482415A
CN115482415A CN202211151337.3A CN202211151337A CN115482415A CN 115482415 A CN115482415 A CN 115482415A CN 202211151337 A CN202211151337 A CN 202211151337A CN 115482415 A CN115482415 A CN 115482415A
Authority
CN
China
Prior art keywords
sample
image
main
images
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211151337.3A
Other languages
Chinese (zh)
Inventor
冯伟
张政
吕晶晶
庞新强
王维珍
李耀宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202211151337.3A priority Critical patent/CN115482415A/en
Publication of CN115482415A publication Critical patent/CN115482415A/en
Priority to PCT/CN2023/120364 priority patent/WO2024061311A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4038Scaling the whole image or part thereof for image mosaicing, i.e. plane images composed of plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/32Indexing scheme for image data processing or generation, in general involving image mosaicing

Abstract

The embodiment of the disclosure provides a model training method, an image classification method and an image classification device. The model training method comprises the following steps: firstly, a training sample set is obtained, the training sample set comprises a sample image corresponding to a sample article and a sample classification result of the sample image, the sample image is composed of a sample head image corresponding to the sample article and a sample main image, then an initial model comprising a residual error neural network and a classification network is constructed, finally, a machine learning method is utilized, the sample image is used as the input of the residual error neural network, a first characteristic vector corresponding to the sample head image and a second characteristic vector corresponding to the sample main image are obtained, the first characteristic vector and the second characteristic vector output by the residual error neural network are used as the input of the classification network, the sample classification result of the sample image is used as the expected output, the initial model is trained to obtain an image classification model, the characteristics of the formed sample image can be extracted, the characteristic vectors and the prediction classification result are formed, and whether the main image is qualified or not is judged.

Description

Model training method, image classification method and device
Technical Field
The embodiment of the disclosure relates to the technical field of computers and the technical field of internet, in particular to the technical field of image processing and artificial intelligence, and particularly relates to a model training method, an image classification method and an image classification device.
Background
On an e-commerce platform, various commodity pictures mainly show a first picture designed by a merchant, and cannot be shown in a targeted manner according to the preference of a user. In order to improve the content diversity of the commodity pictures, a commodity picture pixel mining technology is required to screen out high-quality materials in a main commodity picture and simultaneously screen out low-quality materials in the main commodity picture. The existing commercial picture material mining method generally comprises the steps of judging whether the area proportion of a saliency region in a picture is moderate through saliency detection, judging whether the picture contains timeliness information through a character detection and recognition technology, calculating whether the feature similarity of a first picture and a main picture is larger than a threshold value through a pre-training model, judging the conditions of all main pictures of a commercial product, and obtaining the main picture which can simultaneously meet the conditions as a high-quality material.
However, the method contains a large amount of hyper-parameters which need to be optimized, the method is difficult to deal with the contents of changeable commodity pictures, the characterization capability of the extracted features of the model to the pictures is not strong, the extracted features neglect the association relationship between the first picture and the main picture, and the main picture with high quality and large commodity form difference is easy to be wrongly divided.
Disclosure of Invention
The embodiment of the disclosure provides a model training method, an image classification device, electronic equipment and a computer readable medium.
In a first aspect, an embodiment of the present disclosure provides a model training method, including: obtaining a training sample set, wherein the training sample set comprises sample images corresponding to sample articles and sample classification results of the sample images, and the sample images consist of sample head images corresponding to the sample articles and a sample main graph; constructing an initial model comprising a residual error neural network and a classification network; and using a machine learning method, taking the sample image as the input of a residual error neural network, obtaining a first characteristic vector corresponding to a sample head image and a second characteristic vector corresponding to a sample main image, taking the first characteristic vector and the second characteristic vector output by the residual error neural network as the input of a classification network, taking a sample classification result of the sample image as an expected output, and training an initial model to obtain an image classification model.
In some embodiments, obtaining a training sample set comprises: aiming at a plurality of sample articles, obtaining a sample head graph and a plurality of sample main graphs corresponding to each sample article; classifying the main graphs of the multiple samples of each sample article to obtain sample classification results of the main graphs of the multiple samples; respectively carrying out image splicing on the sample head image of each sample article and each corresponding sample main image to obtain a plurality of sample images consisting of the sample head images and the sample main images; and acquiring a training sample set based on a plurality of sample images corresponding to the plurality of sample articles and a sample classification result of the sample main graph in each sample image.
In some embodiments, image stitching is performed on the sample head image of each sample article and each corresponding sample main image, so as to obtain a plurality of sample images composed of the sample head image and the sample main image, including: respectively carrying out image splicing on the sample head image and each sample main image aiming at each sample article to obtain a plurality of spliced images; performing image processing on a sample first image in the multiple spliced images to obtain multiple processed spliced images; and carrying out scaling processing on the plurality of processed spliced images to obtain a plurality of sample images consisting of the sample head images and the sample main image.
In some embodiments, the image processing the sample header image of the multiple stitched images to obtain the multiple processed stitched images includes: randomly reversing sample head images in the spliced images to obtain a plurality of first spliced images comprising the first sample head image; and carrying out pixel processing on the first sample head images in the first spliced images to obtain a plurality of processed spliced images.
In some embodiments, classifying the plurality of sample hosts for each sample item to obtain a sample classification result for the plurality of sample hosts comprises: classifying the multiple sample main graphs of each sample article, and determining a qualified sample main graph and an unqualified sample main graph in the multiple sample main graphs of each sample article; classifying the category of the unqualified sample main graph to obtain a sample classification result corresponding to the unqualified sample main graph; and obtaining a sample classification result of the multiple sample main graphs based on the sample classification result of the qualified sample main graph and the sample classification result corresponding to the unqualified sample main graph.
In some embodiments, the residual neural network includes convolutional and pooling layers; and by using a machine learning method, taking the sample image as the input of a residual error neural network, acquiring a first feature vector corresponding to a sample head graph and a second feature vector corresponding to a sample main graph, taking the first feature vector and the second feature vector output by the residual error neural network as the input of a classification network, taking a sample classification result of the sample image as expected output, and training an initial model to obtain an image classification model, wherein the method comprises the following steps: by utilizing a machine learning method, taking a sample image as input of a residual error neural network, performing feature extraction on the sample image through a convolutional layer, and acquiring image features corresponding to the sample image, wherein the image features comprise first features of a sample head image and second features of a sample main image; performing pooling processing on the first feature and the second feature through a pooling layer to obtain a first feature vector corresponding to the sample head graph and a second feature vector corresponding to the sample main graph; vector splicing is carried out on the first characteristic vector and the second characteristic vector to obtain an image characteristic vector corresponding to the sample image; taking the image characteristic vector as the input of a classification network, performing classification prediction on the image characteristic vector through the classification network, and outputting a prediction classification result corresponding to the sample image; and training the initial model based on the prediction classification result and the sample classification result of the sample image to obtain an image classification model.
In a second aspect, an embodiment of the present disclosure provides an image classification method, including: acquiring a target head picture and a plurality of target main pictures corresponding to a target article; acquiring a plurality of target images corresponding to the target object based on the target head image and the plurality of target main images; and inputting the plurality of target images into an image classification model to obtain classification results corresponding to the main images of the plurality of targets, wherein the image classification model is obtained based on the model training method.
In some embodiments, the classification results include a confidence level corresponding to the qualified host graph; and, the method further comprises: judging whether the confidence corresponding to the qualified main graph is larger than a preset threshold value or not according to the classification result corresponding to each target main graph; in response to the fact that the confidence degree corresponding to the qualified main graph is larger than a preset threshold value, determining that the target main graph is the qualified target main graph; and in response to determining that the confidence degree corresponding to the qualified main graph is not greater than the preset threshold value, determining the target main graph as the unqualified target main graph.
In a third aspect, an embodiment of the present disclosure provides a model training apparatus, including: the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is configured to acquire a training sample set, the training sample set comprises a sample image corresponding to a sample article and a sample classification result of the sample image, and the sample image consists of a sample head image corresponding to the sample article and a sample main image; a construction module configured to construct an initial model comprising a residual neural network and a classification network; and the training module is configured to utilize a machine learning method to take the sample image as the input of the residual error neural network, obtain a first feature vector corresponding to the sample head image and a second feature vector corresponding to the sample main image, take the first feature vector and the second feature vector output by the residual error neural network as the input of the classification network, take the sample classification result of the sample image as the expected output, train the initial model and obtain the image classification model.
In some embodiments, the obtaining module is further configured to: aiming at a plurality of sample articles, obtaining a sample head graph and a plurality of sample main graphs corresponding to each sample article; classifying the main graphs of the multiple samples of each sample article to obtain sample classification results of the main graphs of the multiple samples; respectively carrying out image splicing on the sample head image of each sample article and each corresponding sample main image to obtain a plurality of sample images consisting of the sample head images and the sample main images; and acquiring a training sample set based on a plurality of sample images corresponding to the sample articles and a sample classification result of a sample main graph in each sample image.
In some embodiments, the obtaining module is further configured to: respectively carrying out image splicing on the sample head picture and each sample main picture aiming at each sample article to obtain a plurality of spliced images; performing image processing on a sample first image in the multiple spliced images to obtain multiple processed spliced images; and carrying out scaling processing on the plurality of processed spliced images to obtain a plurality of sample images consisting of the sample head images and the sample main images.
In some embodiments, the obtaining module is further configured to: randomly inverting a sample head image in the spliced images to obtain a plurality of first spliced images comprising the first sample head image; and carrying out pixel processing on the first sample head images in the first spliced images to obtain a plurality of processed spliced images.
In some embodiments, the obtaining module is further configured to: classifying the multiple sample main graphs of each sample article to determine a qualified sample main graph and an unqualified sample main graph in the multiple sample main graphs of each sample article; classifying the category of the unqualified sample main graph to obtain a sample classification result corresponding to the unqualified sample main graph; and obtaining a sample classification result of the multiple sample main graphs based on the sample classification result of the qualified sample main graph and the sample classification result corresponding to the unqualified sample main graph.
In some embodiments, the residual neural network includes convolutional and pooling layers; and a training module further configured to: by utilizing a machine learning method, taking a sample image as input of a residual error neural network, performing feature extraction on the sample image through a convolutional layer, and acquiring image features corresponding to the sample image, wherein the image features comprise first features of a sample head image and second features of a sample main image; performing pooling processing on the first feature and the second feature through a pooling layer to obtain a first feature vector corresponding to the sample head graph and a second feature vector corresponding to the sample main graph; vector splicing is carried out on the first feature vector and the second feature vector to obtain an image feature vector corresponding to the sample image; taking the image characteristic vector as the input of a classification network, performing classification prediction on the image characteristic vector through the classification network, and outputting a prediction classification result corresponding to the sample image; and training the initial model based on the prediction classification result and the sample classification result of the sample image to obtain an image classification model.
In a fourth aspect, an embodiment of the present disclosure provides an image classification apparatus, including: the acquisition module is configured to acquire a target first image and a plurality of target main images corresponding to a target article; acquiring a plurality of target images corresponding to the target object based on the target head image and the plurality of target main images; and the classification module is configured to input the multiple target images into the image classification model to obtain a classification result corresponding to the multiple target main images, wherein the image classification model is obtained based on the model training method.
In some embodiments, the classification results include a confidence level corresponding to the qualified host graph; and, the apparatus further comprises: the judging module is configured to judge whether the confidence coefficient corresponding to the qualified main graph is larger than a preset threshold value or not according to the classification result corresponding to each target main graph; a determination module configured to determine the target host graph as a qualified target host graph in response to determining that the confidence level corresponding to the qualified host graph is greater than a preset threshold; a determination module configured to determine the target primary graph as an unqualified target primary graph in response to determining that the confidence level corresponding to the qualified primary graph is not greater than a preset threshold.
In a fifth aspect, an embodiment of the present disclosure provides an electronic device, including: one or more processors; a storage device having one or more programs stored thereon; when executed by one or more processors, cause the one or more processors to implement a model training method, an image classification method, as described in any one of the embodiments of the first or second aspects.
In a sixth aspect, embodiments of the present disclosure provide a computer readable medium, on which a computer program is stored, the computer program, when executed by a processor, implements the model training method and the image classification method described in any of the embodiments of the first aspect or the second aspect.
The model training method provided by the embodiment of the disclosure includes that the execution main body firstly obtains a training sample set, the training sample set includes a sample image corresponding to a sample article and a sample classification result of the sample image, the sample image is composed of a sample head image corresponding to the sample article and a sample main image, then an initial model including a residual error neural network and a classification network is constructed, finally a machine learning method is utilized, the sample image is used as an input of the residual error neural network, a first characteristic vector corresponding to the sample head image and a second characteristic vector corresponding to a sample main image are obtained, the first characteristic vector and the second characteristic vector output by the residual error neural network are used as an input of the classification network, and a sample classification result of the sample image is used as an expected output.
Drawings
Other features, objects and advantages of the present disclosure will become more apparent upon reading of the detailed description of non-limiting embodiments made with reference to the following drawings:
FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present disclosure may be applied;
FIG. 2 is a flow diagram of one embodiment of a model training method according to the present disclosure;
FIG. 3 is a flow diagram for one embodiment of obtaining a training sample set, according to the present disclosure;
FIG. 4 is a flow diagram for one embodiment of acquiring multiple sample images according to the present disclosure;
FIG. 5 is a flow diagram for one embodiment of training the initial model, according to the present disclosure;
FIG. 6 is a flow diagram for one embodiment of an image classification method according to the present disclosure;
FIG. 7 is a flow diagram of another embodiment of an image classification method according to the present disclosure;
FIG. 8 is a schematic block diagram of one embodiment of a model training apparatus according to the present disclosure;
FIG. 9 is a schematic structural diagram of one embodiment of an image classification apparatus according to the present disclosure;
FIG. 10 is a schematic structural diagram of an electronic device suitable for use in implementing embodiments of the present disclosure.
Detailed Description
The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant disclosure and are not limiting of the disclosure. It should be noted that, for the convenience of description, only the parts relevant to the related disclosure are shown in the drawings.
It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
FIG. 1 illustrates an exemplary system architecture 100 of a model training method, an information generation method, a model training apparatus, and an information generation apparatus to which embodiments of the present disclosure may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 104, 105, 106, a network 107, and servers 101, 102, 103. The network 107 serves as a medium for providing communication links between the terminal devices 104, 105, 106 and the servers 101, 102, 103. The network 107 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
A user may interact with servers 101, 102, 103 belonging to the same server cluster via a network 107 via terminal devices 104, 105, 106 to receive or transmit information or the like. Various applications may be installed on the terminal devices 104, 105, 106, such as an item presentation application, a data analysis application, a search-type application, and so forth.
The terminal devices 104, 105, 106 may be hardware or software. When the terminal device is hardware, it may be various electronic devices having a display screen and supporting communication with the server, including but not limited to a smart phone, a tablet computer, a laptop portable computer, a desktop computer, and the like. When the terminal device is software, the terminal device can be installed in the electronic devices listed above. It may be implemented as multiple pieces of software or software modules, or as a single piece of software or software module. And is not particularly limited herein.
The servers 101, 102, 103 may be servers that provide various services, such as background servers that receive requests sent by terminal devices with which communication connections are established. The background server can receive and analyze the request sent by the terminal device, and generate a processing result.
The servers 101, 102, and 103 may obtain a training sample set, where the training sample set includes a sample image corresponding to a sample article and a sample classification result of the sample image, the sample image is composed of a sample header image corresponding to the sample article and a sample main image, then an initial model including a residual error neural network and a classification network is constructed, and finally, a machine learning method is used to take the sample image as an input of the residual error neural network, obtain a first feature vector corresponding to the sample header image and a second feature vector corresponding to the sample main image, take the first feature vector and the second feature vector output by the residual error neural network as inputs of the classification network, take the sample classification result of the sample image as an expected output, and train the initial model to obtain an image classification model.
Alternatively, the servers 101, 102, and 103 may obtain a target head map and a plurality of target main maps corresponding to the target item, then obtain a plurality of target images corresponding to the target item based on the target head map and the plurality of target main maps, and finally input the plurality of target images into the image classification model to obtain classification results corresponding to the plurality of target main maps.
The server may be hardware or software. When the server is hardware, it may be various electronic devices that provide various services to the terminal device. When the server is software, it may be implemented as a plurality of software or software modules that provide various services to the terminal device, or may be implemented as a single software or software module that provides various services to the terminal device. And is not particularly limited herein.
It should be noted that the model training method or the image classification method provided by the embodiments of the present disclosure may be executed by the servers 101, 102, 103. Accordingly, model training means or image classification means are provided in the servers 101, 102, 103.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to FIG. 2, a flow 200 of one embodiment of a model training method according to the present disclosure is shown. The model training method comprises the following steps:
step 210, a training sample set is obtained.
In this step, an execution subject (for example, servers 101, 102, 103 in fig. 1) on which the model training method runs may read a sample image corresponding to a sample article from the article display platform or from a local database via a network, where the sample image may be an image composed of a sample head image corresponding to the sample article and a sample main image, where the sample article may be an article displayed in an e-commerce platform, such as an article of clothing, shoes, accessories, and the like; the sample first image can be the first image of the sample article displayed in the e-commerce platform, and the sample main image can be the detailed image of the sample article displayed in the e-commerce platform.
Each sample item may include a multiple sample host that may include a qualified image that can serve as creative material, e.g., an image that is unambiguous in content, clean in image, and clearly conveys the content of the item to the user; unqualified images that cannot be creative material, such as images with age points of interest, irrelevant image content, partial detail content, overly complex content, qualification certificates for items, backside information, etc., may also be included in the multi-sample host. The sample classification result of the sample main graph can comprise a qualified sample main graph and a unqualified sample main graph, the sample image is composed of the sample head graph and the sample main graph, and the sample classification result of the sample image is consistent with the sample classification result of the sample main graph in the sample image.
The execution subject may use a plurality of sample images composed of a sample head image and a sample main image corresponding to each sample item, and a sample classification result corresponding to the sample image as a training sample in a training sample set.
Step 220, an initial model including a residual neural network and a classification network is constructed.
In this step, after the execution subject obtains the training sample set, an initial model including a residual neural network and a classification network may be constructed. Among them, the residual neural network may be a residual neural network frequently used in image classification by stacking a plurality of convolution layers and then connecting through the residual layer and the pooling layer.
The execution main body may use the residual error neural network as a backbone network of the initial model, connect the classification network with the backbone network, and use the output content of the residual error neural network as the input of the classification network.
And step 230, using a machine learning method to take the sample image as the input of the residual error neural network, obtaining a first feature vector corresponding to the sample head image and a second feature vector corresponding to the sample main image, taking the first feature vector and the second feature vector output by the residual error neural network as the input of the classification network, taking the sample classification result of the sample image as the expected output, and training the initial model to obtain the image classification model.
In this step, after the execution subject acquires the training sample set and constructs the initial model, the sample image may be used as the input of the residual error neural network, the output of the residual error neural network is used as the input of the classification network, the sample classification result of the sample image is used as the expected output, and the initial model is trained to obtain the image classification model.
Specifically, the executing subject may input a sample image corresponding to the sample article into the initial model, where the sample image corresponding to the sample article is used as an input of the residual neural network, and the residual neural network may output a first feature vector corresponding to a sample head diagram in the sample image and a second feature vector corresponding to the sample main diagram through processing of the residual neural network. And then inputting the first characteristic vector and the second characteristic vector output by the residual error neural network into a classification network, and outputting a prediction classification result corresponding to the sample image through prediction processing of the classification network.
In the training process, the execution subject can take the sample classification result of the sample images in the training sample set as expected output, then compares the output prediction classification result with the expected output, judges whether the prediction classification result meets the constraint condition, if the prediction classification result does not meet the constraint condition, adjusts the network parameters of the initial model, and inputs the sample images corresponding to the sample articles again to continue training. And if the predicted classification result meets the constraint condition, completing model training to obtain an image classification model. The constraint condition may be that a difference between the prediction classification result and the sample classification result in the training sample set satisfies a threshold, where the threshold may be preset according to experience, and this is not specifically limited by the present disclosure.
The model training method provided by the embodiment of the disclosure includes that the execution main body firstly obtains a training sample set, the training sample set includes a sample image corresponding to a sample article and a sample classification result of the sample image, the sample image is composed of a sample head image corresponding to the sample article and a sample main image, then an initial model including a residual error neural network and a classification network is constructed, finally a machine learning method is utilized, the sample image is used as an input of the residual error neural network, a first characteristic vector corresponding to the sample head image and a second characteristic vector corresponding to a sample main image are obtained, the first characteristic vector and the second characteristic vector output by the residual error neural network are used as an input of the classification network, and a sample classification result of the sample image is used as an expected output.
Referring to fig. 3, fig. 3 shows a flowchart 300 of an embodiment of obtaining a training sample set, i.e. the step 210 above, obtaining the training sample set may include the following steps:
step 310, a sample head graph and a plurality of sample main graphs corresponding to each sample item are obtained.
In this step, the executing body may acquire a plurality of sample items, and the operation of acquiring a sample image may be performed for each sample item. For a plurality of sample articles, the execution subject may obtain a sample head map and a plurality of sample main maps corresponding to each sample article from the article display platform or from the local database through a network.
The sample object may be an object displayed in the e-commerce platform, such as an article of clothing, shoes, accessories, and the like, and the execution subject may use a first image displayed in the e-commerce platform as a sample head image corresponding to the sample object, and use a detailed image displayed in the e-commerce platform as a sample main image corresponding to the sample object, where the detailed image may include detailed images displayed in the e-commerce platform for a plurality of sample objects.
Step 320, classifying the multiple sample main graphs of each sample article to obtain a sample classification result of the multiple sample main graphs.
In this step, after the execution main body obtains the sample head map and the multiple sample main maps corresponding to each sample article, image classification may be performed on the multiple sample main maps corresponding to each sample article, that is, the eligibility of each sample main map corresponding to each sample article is determined, the sample main maps are classified, and it is determined that the sample classification result of each sample main map is a qualified sample main map or an unqualified sample main map.
As an optional implementation manner, the step 320 of classifying the multiple sample main graphs of each sample item to obtain the sample classification result of the multiple sample main graphs may include the following steps:
the method comprises the steps of firstly, classifying a plurality of sample main graphs of each sample article, and determining a qualified sample main graph and an unqualified sample main graph in the plurality of sample main graphs of each sample article.
Specifically, after the execution subject acquires the sample head image and the multiple sample main images corresponding to each sample article, the execution subject may classify the sample head image and the multiple sample main images according to the image content of each sample main image, and determine a qualified sample main image and an unqualified sample main image in the multiple sample main images of each sample article, so that the execution subject may determine the sample classification result corresponding to the qualified sample main image as the qualified sample main image.
And secondly, classifying the unqualified sample main graph to obtain a sample classification result corresponding to the unqualified sample main graph.
Specifically, after the execution subject determines the qualified sample main map and the unqualified sample main map in the multiple sample main maps of each sample article, the unqualified sample main maps can be further classified according to unqualified categories, and the unqualified categories corresponding to the unqualified sample main maps are determined, wherein the unqualified categories can include categories including an aging benefit point, irrelevant image content, partial detail content, excessively complex content, qualification certificate of the article, back information and the like.
The unqualified category containing the time-dependent benefit point can represent that the sample main graph comprises time-dependent image content, and the image is easy to expire; the unqualified category with irrelevant image content can represent that the display content in the sample main graph is irrelevant, for example, the display content on the left side is a treadmill, the display content on the right side is a sound, and the display content is irrelevant, so that a user cannot think of the displayed article; unqualified categories of partial detail contents can represent that the image contents only show partial details of articles and cannot show complete articles; unqualified categories with too complex content can represent that image content is obtained through splicing, and are too complex and insufficient in attractiveness; the unqualified category of the certificate of qualification of the article can represent that the image content shows the certificate of qualification, the use instruction and the like of the article; the unqualified category of the back information can represent that the image content shows the back information of the article, and the aesthetic property is insufficient.
Therefore, the execution subject further classifies the unqualified sample main graph to obtain a sample classification result corresponding to the unqualified sample main graph.
And thirdly, obtaining a sample classification result of the main graph of the multiple samples based on the sample classification result of the main graph of the qualified samples and the sample classification result corresponding to the main graph of the unqualified samples.
Specifically, the execution subject determines a sample classification result corresponding to each sample main graph, that is, the sample classification result may include a qualified sample main graph and a category classification result corresponding to an unqualified sample main graph, and the sample classification result of the qualified sample main graph and the sample classification result corresponding to the unqualified sample main graph may be used as sample classification results of a plurality of sample main graphs.
In the implementation mode, the classification result corresponding to the unqualified sample main image is further finely divided, the fine-grained classification of the sample main image is completed, the initial model is trained based on the sample classification result, the training model can be assisted to classify the sample image, and the accuracy of the image classification model can be improved.
And 330, respectively carrying out image splicing on the sample head image of each sample article and each corresponding sample main image to obtain a plurality of sample images consisting of the sample head images and the sample main images.
In this step, for each sample article, the sample head image of the execution subject sample article is sequentially image-stitched with each sample main image, and a plurality of sample images composed of the sample head image and the sample main image are obtained. As an example, the executing entity may perform sequential horizontal stitching on the sample head map and each sample host map to obtain multiple sample images composed of the sample head map and the sample host map.
Step 340, obtaining a training sample set based on a plurality of sample images corresponding to a plurality of sample articles and a sample classification result of a sample main graph in each sample image.
In this step, after the executing body obtains a plurality of sample images corresponding to each sample article, the sample classification result corresponding to the sample main graph in each sample image may be determined as the sample classification result corresponding to the sample image. The executing subject may combine a plurality of sample images corresponding to each sample article and a sample classification result corresponding to each sample image into a training sample set.
In the implementation mode, the sample main graphs are classified, a plurality of sample images corresponding to each sample article and a sample classification result corresponding to each sample image form a training sample set, the obtained initial data can be classified, the training data in the training sample set is more perfect, the precision and balance of the training data in the training sample set are improved, and high-quality training data are provided for model training.
Referring to fig. 4, fig. 4 shows a flowchart 400 of an embodiment of obtaining a plurality of sample images, that is, the step 330 of image stitching the sample head image of each sample item with each corresponding sample main image to obtain a plurality of sample images composed of the sample head image and the sample main image may include the following steps:
and step 410, respectively carrying out image splicing on the sample head image and each sample main image aiming at each sample article to obtain a plurality of spliced images.
In this step, for each sample article, the sample head image of the execution subject sample article is sequentially image-stitched with each sample main image, and a plurality of stitched images composed of the sample head image and the sample main image are obtained. As an example, the executing entity may perform sequential horizontal stitching between the sample head map and each sample main map, so as to obtain multiple stitched images composed of the sample head map and the sample main map.
And step 420, performing image processing on the sample head images in the spliced images to obtain processed spliced images.
In this step, after the execution main body obtains the plurality of stitched images, the execution main body may perform image processing on the sample header image in each stitched image, and perform data augmentation operation on the sample header image in each stitched image, so as to increase the sample header images, make the sample header images as diverse as possible, and make the trained model have a stronger generalization capability. The data augmentation operations may include, among others, horizontal/vertical flipping, rotation, scaling, clipping, shearing, translation, contrast, color dithering, noise, and the like. The execution main body can obtain a plurality of processed spliced images after respectively carrying out data augmentation operation on the first images of the samples in the spliced images.
As an optional implementation manner, in step 420, performing image processing on the sample header image in the multiple stitched images to obtain multiple processed stitched images, may include the following steps:
the method comprises the following steps of firstly, randomly reversing sample head images in a plurality of spliced images to obtain a plurality of first spliced images comprising the first sample head images.
Specifically, after the execution subject acquires a plurality of stitched images, the image content of the sample header image in each stitched image may be randomly inverted. As an example, the executing entity may perform a random inversion operation on the sample header map in each of the stitched images with a probability of 0.5 to obtain the first sample header map, so as to obtain a plurality of first stitched images including the first sample header map.
And secondly, performing pixel processing on the first sample head images in the first spliced images to obtain a plurality of processed spliced images.
Specifically, after the execution subject acquires a plurality of first stitched images including the first sample head image, the first sample head image in each first stitched image may be subjected to pixel processing, so as to perform image erasure on the first sample head image, and obtain a plurality of processed stitched images.
Specifically, the executing entity may randomly select a region in the first sample header image, and calculate an average pixel of the image content in the region. Then, the execution subject may adjust the pixel value in the region according to the average pixel, so as to obtain a plurality of processed stitched images.
In the implementation mode, random inversion and pixel processing are carried out on the sample head graph, data augmentation operation on the sample head graph is achieved, diversity of the sample head graph is improved, loss of key information in the main graph is avoided, and meanwhile the generalization of the training model can be improved.
And 430, performing scaling processing on the plurality of processed spliced images to obtain a plurality of sample images consisting of the sample head images and the sample main images.
In this step, after the execution main body obtains the plurality of processed stitched images, the execution main body may perform scaling processing on each of the plurality of processed stitched images, adjust the image sizes of the plurality of processed stitched images, and adjust each of the processed stitched images to a preset image size, which is not specifically limited by the present disclosure. Therefore, the execution main body performs scaling processing on the plurality of processed spliced images to obtain a plurality of sample images consisting of the sample head image and the sample main image.
In the implementation mode, the image splicing is carried out on the sample head image and each sample main image, then the image processing is carried out on the sample head image in the spliced image, the spliced image is subjected to zooming processing to obtain the sample image, the diversity of the sample image is improved, the training data concentrated by the training sample is perfect, the precision and balance of the training data concentrated by the training sample are improved, and high-quality training data are provided for model training.
Referring to fig. 5, fig. 5 shows a flowchart 500 of an embodiment of training an initial model, that is, the step 230 is to use a machine learning method to take a sample image as an input of a residual neural network, obtain a first feature vector corresponding to a sample head map and a second feature vector corresponding to a sample main map, take the first feature vector and the second feature vector output by the residual neural network as inputs of a classification network, take a sample classification result of the sample image as an expected output, train the initial model, and obtain an image classification model, and may include the following steps:
and step 510, taking the sample image as input of a residual error neural network by using a machine learning method, and performing feature extraction on the sample image through the convolutional layer to obtain image features corresponding to the sample image.
The residual neural network may include a convolutional layer and two pooling layers, where the convolutional layer is used to extract image features in the sample image, and the two pooling layers may respectively pool image features of the sample head image and the sample main image, and the like.
In this step, the execution subject may use the sample image as an input of the residual neural network, perform feature extraction on the sample image through a convolutional layer in the residual neural network, capture an image feature common between the sample header and the sample main graph, learn a relationship between the sample header and the sample main graph, and obtain an image feature corresponding to the sample image, where the image feature may include a first feature of the sample header and a second feature of the sample main graph.
As an example, the execution subject performs feature extraction on the sample image through the convolution layer in the residual neural network, and the extracted image feature may be [ V 1 ,V 2 ]Wherein V is 1 Characterizing a first feature, V, corresponding to the first graph of the sample 2 And characterizing a second feature corresponding to the sample host graph.
And 520, performing pooling processing on the first feature and the second feature through the pooling layer to obtain a first feature vector corresponding to the sample head graph and a second feature vector corresponding to the sample main graph.
In this step, after the execution main body obtains the image features corresponding to the sample image through the convolutional layer in the residual neural network, the image features including the first feature and the second feature output by the convolutional layer may be input to the pooling layer, and the pooling layer performs pooling processing on the first feature and the second feature respectively to obtain a first feature vector corresponding to the sample head diagram and a second feature vector corresponding to the sample main diagram.
As an example, the extracted image feature may be [ V ] 1 ,V 2 ]Wherein, V 1 Characterizing a first feature, V, corresponding to the first graph of the sample 2 The execution main body can input the first features corresponding to the sample initial graph into one pooling layer for pooling processing to obtain a first feature vector, and input the second features corresponding to the sample main graph into the other pooling layer for pooling processing to obtain a second feature vector.
And step 530, carrying out vector splicing on the first characteristic vector and the second characteristic vector to obtain an image characteristic vector corresponding to the sample image.
In this step, after the execution subject obtains the first feature vector corresponding to the sample head graph and the second feature vector corresponding to the sample main graph, the first feature vector and the second feature vector may be vector-spliced to be connected into the image feature vector corresponding to the sample image.
And 540, taking the image characteristic vector as the input of a classification network, performing classification prediction on the image characteristic vector through the classification network, and outputting a prediction classification result corresponding to the sample image.
In this step, after the execution subject obtains the image feature vector corresponding to the sample image, the execution subject may use the image feature vector as an input of a classification network, perform classification prediction on the image feature vector through the classification network, and output a prediction classification result corresponding to the sample image.
And 550, training the initial model based on the prediction classification result and the sample classification result of the sample image to obtain an image classification model.
In this step, the executing entity may compare the output prediction classification result with the expected output, determine whether the prediction classification result meets the constraint condition, adjust the network parameters of the initial model if the prediction classification result does not meet the constraint condition, and input the sample image corresponding to the sample article again to continue training. And if the predicted classification result meets the constraint condition, completing model training to obtain an image classification model. The constraint condition may be that a difference between the prediction classification result and the sample classification result in the training sample set satisfies a threshold, where the threshold may be preset according to experience, and this is not specifically limited by the present disclosure.
In the implementation mode, by adopting a mode of performing branch processing on two image areas, namely performing global pooling on the feature areas corresponding to the two images respectively, obtaining the feature vectors for classification by performing the global pooling on the features compared with the traditional residual neural network, the feature confusion of the sample head image and the sample main image can be avoided, the discrimination capability of the model on the sample main image is enhanced, the accuracy of the feature vectors is improved, and the model can be more accurate.
Referring to fig. 6, fig. 6 illustrates a flow diagram 600 of one embodiment of an image classification method according to the present disclosure. The image classification method may include the steps of:
and step 610, acquiring a target first graph and a plurality of target main graphs corresponding to the target object.
In this step, an execution subject (for example, servers 101, 102, 103 in fig. 1) on which the image classification method operates may obtain a target head map and a plurality of target main maps corresponding to a target item, where the target item may be an item displayed in an e-commerce platform, such as an item of clothing, shoes, accessories, and the like; the target first image can be a first image displayed by the target object in the e-commerce platform, and the target main image can be a detailed image displayed by the target object in the e-commerce platform.
And step 620, acquiring a plurality of target images corresponding to the target object based on the target head image and the plurality of target main images.
In this step, the executing body sequentially performs image stitching, image processing, and scaling processing on the target head image and each target main image according to the target head image and the plurality of target main images corresponding to the target article, and obtains a plurality of target images corresponding to the target article, where the target image may be composed of the target head image and the target main images.
Step 630, inputting the multiple target images into the image classification model to obtain classification results corresponding to the multiple target main images.
In this step, after the execution main body obtains multiple target images corresponding to the target object, each target image may be input to the image classification model, a residual neural network in the image classification model may perform feature extraction on the target image to obtain a target image feature vector corresponding to the target image, then the target image feature vector is input to a classification network in the image classification model to perform prediction classification, and a classification result corresponding to each target main image is output.
The image classification model is obtained based on the model training method, that is, the image classification model can be obtained based on the steps in fig. 2 to 5, and the image classification model can be used for distinguishing and classifying the main pictures of the articles to determine whether each main picture is qualified material.
According to the image classification method provided by the embodiment of the disclosure, the execution main body firstly obtains the target head image and the multiple target main images corresponding to the target object, then obtains the multiple target images corresponding to the target object based on the target head image and the multiple target main images, finally inputs the multiple target images into the image classification model to obtain the classification results corresponding to the multiple target main images, the image classification model is obtained based on the model training method, the characteristics of the formed target images can be extracted, the characteristic vectors and the prediction classification results are formed, the judgment on whether the target main images are qualified is finally realized, the contents of the target head image and the target main images can be simultaneously sensed, the relevance between the characteristics is improved, the accuracy of the judgment on the target main images can be improved, the material mining task can be accurately and efficiently completed, the creative material base is effectively expanded, and the efficiency and the accuracy of the material mining can be improved.
Referring to fig. 7, fig. 7 shows a flowchart 700 of another embodiment of an image classification method according to the present disclosure. The image classification method may include the steps of:
and step 710, acquiring a target first image and a plurality of target main images corresponding to the target object.
Step 710 of this embodiment may be performed in a manner similar to step 610 of the embodiment shown in fig. 6, which is not described herein again.
And 720, acquiring a plurality of target images corresponding to the target object based on the target head image and the plurality of target main images.
Step 720 of this embodiment can be performed in a manner similar to step 620 in the embodiment shown in fig. 6, which is not described herein again.
Step 730, inputting the multiple target images into the image classification model to obtain classification results corresponding to the multiple target main images.
Step 730 of this embodiment can be performed in a manner similar to step 630 of the embodiment shown in fig. 6, which is not described herein again.
Step 740, determining whether the confidence corresponding to the qualified main graph is greater than a preset threshold value according to the classification result corresponding to each target main graph.
The classification result can comprise a plurality of classification categories and the confidence degree corresponding to each classification category, wherein the classification categories comprise a qualified main map, an unqualified category containing an aging benefit point, an unqualified category irrelevant to image content, an unqualified category of partial detail content, an unqualified category with excessively complex content, an unqualified category of a qualification certificate of an article and an unqualified category of back information.
In this step, the image classification model outputs a classification result corresponding to each target main pattern, and the execution subject determines a confidence corresponding to the qualified main pattern from the classification results. The execution main body compares the confidence corresponding to the qualified main graph with a preset threshold value, and judges whether the confidence corresponding to the qualified main graph is larger than the preset threshold value, wherein the preset threshold value can be set according to experience, and the comparison is not specifically limited.
In response to determining that the confidence corresponding to the qualified main graph is greater than the preset threshold, step 750 is performed, and in response to determining that the confidence corresponding to the qualified main graph is greater than the preset threshold, the target main graph is determined to be the qualified target main graph.
In this step, if the execution subject determines that the confidence corresponding to the qualified main map is greater than the preset threshold value through judgment, it is determined that the target main map in the target image is the qualified target main map, which can be used as qualified material for expanding creative material.
In response to determining that the confidence degree corresponding to the qualified main map is not greater than the preset threshold, step 760 is executed, and in response to determining that the confidence degree corresponding to the qualified main map is not greater than the preset threshold, the target main map is determined to be an unqualified target main map.
In this step, if the execution main body determines that the confidence corresponding to the qualified main map is not greater than the preset threshold value through judgment, it is determined that the target main map in the target image is an unqualified target main map, and the target main map cannot be used as a qualified material and cannot be used for expanding the creative material.
In this embodiment, judge through the confidence to qualified main map, determine whether target main map is qualified main map, improved the accuracy of target main map screening, can select the high-quality material of intention, can effectively promote the intention richness.
With further reference to FIG. 8, as an implementation of the methods illustrated in the above figures, the present disclosure provides one embodiment of a model training apparatus. This embodiment of the device corresponds to the embodiment of the method shown in fig. 2.
As shown in fig. 8, the model training apparatus 800 of the present embodiment may include: an acquisition module 810, a construction module 820, and a training module 830.
The obtaining module 810 is configured to obtain a training sample set, where the training sample set includes a sample image corresponding to a sample article and a sample classification result of the sample image, and the sample image is composed of a sample head image corresponding to the sample article and a sample main image;
a construction module 820 configured to construct an initial model comprising a residual neural network and a classification network;
the training module 830 is configured to use a machine learning method to take a sample image as an input of a residual error neural network, obtain a first feature vector corresponding to a sample head graph and a second feature vector corresponding to a sample main graph, take the first feature vector and the second feature vector output by the residual error neural network as inputs of a classification network, take a sample classification result of the sample image as an expected output, train an initial model, and obtain an image classification model.
In some optional implementations of this implementation, the obtaining module 810 is further configured to: aiming at a plurality of sample articles, obtaining a sample head graph and a plurality of sample main graphs corresponding to each sample article; classifying the main graphs of the multiple samples of each sample article to obtain sample classification results of the main graphs of the multiple samples; respectively carrying out image splicing on the sample head image of each sample article and each corresponding sample main image to obtain a plurality of sample images consisting of the sample head images and the sample main images; and acquiring a training sample set based on a plurality of sample images corresponding to the sample articles and a sample classification result of a sample main graph in each sample image.
In some optional implementations of this implementation, the obtaining module 810 is further configured to: respectively carrying out image splicing on the sample head image and each sample main image aiming at each sample article to obtain a plurality of spliced images; performing image processing on a sample first image in the multiple spliced images to obtain multiple processed spliced images; and carrying out scaling processing on the plurality of processed spliced images to obtain a plurality of sample images consisting of the sample head images and the sample main image.
In some optional implementations of this implementation, the obtaining module 810 is further configured to: randomly reversing sample head images in the spliced images to obtain a plurality of first spliced images comprising the first sample head image; and carrying out pixel processing on the first sample head images in the first spliced images to obtain a plurality of processed spliced images.
In some optional implementations of this implementation, the obtaining module 810 is further configured to: classifying the multiple sample main graphs of each sample article, and determining a qualified sample main graph and an unqualified sample main graph in the multiple sample main graphs of each sample article; classifying the unqualified sample main graph to obtain a sample classification result corresponding to the unqualified sample main graph; and obtaining a sample classification result of the multiple sample main graphs based on the sample classification result of the qualified sample main graph and the sample classification result corresponding to the unqualified sample main graph.
In some optional implementations of this implementation, the residual neural network includes a convolutional layer and a pooling layer; and, training module 830, further configured to: by utilizing a machine learning method, taking a sample image as input of a residual error neural network, performing feature extraction on the sample image through a convolutional layer, and acquiring image features corresponding to the sample image, wherein the image features comprise first features of a sample head image and second features of a sample main image; performing pooling processing on the first feature and the second feature through a pooling layer to obtain a first feature vector corresponding to the sample head graph and a second feature vector corresponding to the sample main graph; vector splicing is carried out on the first feature vector and the second feature vector to obtain an image feature vector corresponding to the sample image; taking the image characteristic vector as the input of a classification network, performing classification prediction on the image characteristic vector through the classification network, and outputting a prediction classification result corresponding to the sample image; and training the initial model based on the prediction classification result and the sample classification result of the sample image to obtain an image classification model.
The model training device provided by the above embodiment of the present disclosure includes that the execution main body first obtains a training sample set, the training sample set includes a sample image corresponding to a sample article and a sample classification result of the sample image, the sample image is composed of a sample head image corresponding to the sample article and a sample main image, then constructs an initial model including a residual error neural network and a classification network, and finally utilizes a machine learning method to take the sample image as an input of the residual error neural network, obtains a first feature vector corresponding to the sample head image and a second feature vector corresponding to the sample main image, takes the first feature vector and the second feature vector output by the residual error neural network as an input of the classification network, and takes the sample classification result of the sample image as an expected output, trains the initial model to obtain an image classification model.
Those skilled in the art will appreciate that the above-described apparatus may also include some other well-known structure, such as a processor, memory, etc., which is not shown in fig. 8 in order not to unnecessarily obscure embodiments of the present disclosure.
With further reference to fig. 9, the present disclosure provides one embodiment of an image classification apparatus as an implementation of the methods illustrated in the above figures. This device embodiment corresponds to the method embodiment shown in fig. 6.
As shown in fig. 9, the image classification apparatus 900 of the present embodiment may include: an acquisition module 910 and a classification module 920.
The obtaining module 910 is configured to obtain a target first graph and a plurality of target main graphs corresponding to a target item; acquiring a plurality of target images corresponding to the target object based on the target head image and the plurality of target main images;
the classification module 920 is configured to input a plurality of target images into an image classification model, so as to obtain classification results corresponding to the plurality of target main images, where the image classification model is obtained based on the model training method.
In some optional implementations of this implementation, the classification result includes a confidence level corresponding to the qualified main graph; and, the apparatus further comprises: the judging module is configured to judge whether the confidence coefficient corresponding to the qualified main graph is larger than a preset threshold value or not according to the classification result corresponding to each target main graph; a determination module configured to determine the target host graph as a qualified target host graph in response to determining that the confidence level corresponding to the qualified host graph is greater than a preset threshold; a determination module configured to determine the target primary graph as an unqualified target primary graph in response to determining that the confidence level corresponding to the qualified primary graph is not greater than a preset threshold.
According to the image classification device provided by the embodiment of the disclosure, the execution main body firstly obtains the target head image and the multiple target main images corresponding to the target object, then obtains the multiple target images corresponding to the target object based on the target head image and the multiple target main images, and finally inputs the multiple target images into the image classification model to obtain the classification results corresponding to the multiple target main images, the image classification model is obtained based on the model training method, the characteristics of the formed target images can be extracted, the characteristic vectors and the prediction classification results are formed, and finally, the judgment on whether the target main images are qualified is realized.
Those skilled in the art will appreciate that the above-described apparatus may also include some other well-known structures, such as processors, memories, etc., which are not shown in fig. 9 in order to not unnecessarily obscure embodiments of the present disclosure.
Referring now to FIG. 10, shown is a schematic diagram of an electronic device 1000 suitable for use in implementing embodiments of the present disclosure. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a smart screen, a notebook computer, a PAD (tablet computer), a PMP (portable multimedia player), a car terminal (e.g., car navigation terminal), etc., and a fixed terminal such as a digital TV, a desktop computer, etc. The terminal device shown in fig. 10 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 10, the electronic device 1000 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 1001 that may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1002 or a program loaded from a storage means 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data necessary for the operation of the electronic apparatus 1000 are also stored. The processing device 1001, the ROM 1002, and the RAM 1003 are connected to each other by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.
Generally, the following devices may be connected to the I/O interface 1005: input devices 1006 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 1007 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage devices 1008 including, for example, magnetic tape, hard disk, and the like; and a communication device 1009. The communications apparatus 1009 may allow the electronic device 1000 to communicate wirelessly or by wire with other devices to exchange data. While fig. 10 illustrates an electronic device 1000 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 10 may represent one device or may represent multiple devices as desired.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer-readable medium, the computer program comprising program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from the network through the communication means 1009, or installed from the storage means 1008, or installed from the ROM 1002. The computer program, when executed by the processing device 1001, performs the above-described functions defined in the methods of the embodiments of the present disclosure. It should be noted that the computer readable medium of the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C + +, and including conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor comprises an acquisition module, a construction module and a training module; it can also be described as: a processor includes an acquisition module and a classification module; wherein the names of the modules do not in some cases constitute a limitation of the module itself.
As another aspect, the present application also provides a computer-readable medium, which may be included in the electronic device; or may be separate and not incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: obtaining a training sample set, wherein the training sample set comprises sample images corresponding to sample articles and sample classification results of the sample images, and the sample images consist of sample head images corresponding to the sample articles and a sample main graph; constructing an initial model comprising a residual error neural network and a classification network; and by utilizing a machine learning method, taking the sample image as the input of a residual error neural network, obtaining a first characteristic vector corresponding to a sample head graph and a second characteristic vector corresponding to a sample main graph, taking the first characteristic vector and the second characteristic vector output by the residual error neural network as the input of a classification network, taking a sample classification result of the sample image as expected output, and training an initial model to obtain an image classification model. Or, cause the electronic device to: acquiring a target head picture and a plurality of target main pictures corresponding to a target object; acquiring a plurality of target images corresponding to a target object based on a target head image and a plurality of target main images; inputting a plurality of target images into an image classification model to obtain classification results corresponding to the main images of the plurality of targets, wherein the image classification model is obtained based on the model training method
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept as defined above. For example, the above features and (but not limited to) the features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

Claims (13)

1. A method of model training, the method comprising:
obtaining a training sample set, wherein the training sample set comprises a sample image corresponding to a sample article and a sample classification result of the sample image, and the sample image consists of a sample head image corresponding to the sample article and a sample main image;
constructing an initial model comprising a residual error neural network and a classification network;
and by utilizing a machine learning method, taking the sample image as the input of the residual error neural network, obtaining a first characteristic vector corresponding to the sample head graph and a second characteristic vector corresponding to the sample main graph, taking the first characteristic vector and the second characteristic vector output by the residual error neural network as the input of the classification network, taking the sample classification result of the sample image as expected output, and training the initial model to obtain an image classification model.
2. The method of claim 1, wherein the obtaining a training sample set comprises:
aiming at a plurality of sample articles, obtaining a sample head image and a plurality of sample main images corresponding to each sample article;
classifying a plurality of sample main graphs of each sample article to obtain a sample classification result of the plurality of sample main graphs;
respectively carrying out image splicing on the sample head image of each sample article and each corresponding sample main image to obtain a plurality of sample images consisting of the sample head image and the sample main images;
and acquiring the training sample set based on a plurality of sample images corresponding to the sample articles and a sample classification result of a sample main graph in each sample image.
3. The method of claim 2, wherein said image stitching the sample header image of each sample item with each corresponding sample host to obtain a plurality of sample images consisting of the sample header image and the sample host comprises:
respectively carrying out image splicing on the sample head image and each sample main image aiming at each sample article to obtain a plurality of spliced images;
performing image processing on a sample first image in the spliced images to obtain processed spliced images;
and carrying out scaling processing on the plurality of processed spliced images to obtain a plurality of sample images consisting of the sample head image and the sample main image.
4. The method according to claim 3, wherein the image processing of the sample header image in the plurality of stitched images to obtain a plurality of processed stitched images comprises:
randomly inverting the sample head images in the spliced images to obtain a plurality of first spliced images comprising a first sample head image;
and carrying out pixel processing on the first sample head images in the first spliced images to obtain a plurality of processed spliced images.
5. The method of claim 2, wherein said classifying a plurality of sample hosts for each sample item resulting in a sample classification result for said plurality of sample hosts comprises:
classifying the multiple sample main graphs of each sample article, and determining a qualified sample main graph and an unqualified sample main graph in the multiple sample main graphs of each sample article;
classifying the unqualified sample main graph to obtain a sample classification result corresponding to the unqualified sample main graph;
and obtaining a sample classification result of the multiple sample main graphs based on the sample classification result of the qualified sample main graph and the sample classification result corresponding to the unqualified sample main graph.
6. The method of claim 1, wherein the residual neural network comprises convolutional and pooling layers; and the number of the first and second groups,
the method for utilizing machine learning to obtain an image classification model by taking the sample image as the input of the residual error neural network, acquiring a first feature vector corresponding to the sample head map and a second feature vector corresponding to the sample main map, taking the first feature vector and the second feature vector output by the residual error neural network as the input of the classification network, taking the sample classification result of the sample image as expected output, and training the initial model includes:
using a machine learning method, taking the sample image as the input of the residual error neural network, performing feature extraction on the sample image through the convolutional layer, and acquiring image features corresponding to the sample image, wherein the image features comprise a first feature of the sample head graph and a second feature of the sample main graph;
pooling the first feature and the second feature through the pooling layer to obtain a first feature vector corresponding to the sample head graph and a second feature vector corresponding to the sample main graph;
vector splicing is carried out on the first characteristic vector and the second characteristic vector to obtain an image characteristic vector corresponding to the sample image;
taking the image feature vector as the input of the classification network, performing classification prediction on the image feature vector through the classification network, and outputting a prediction classification result corresponding to the sample image;
and training the initial model based on the prediction classification result and the sample classification result of the sample image to obtain the image classification model.
7. A method of image classification, the method comprising:
acquiring a target head picture and a plurality of target main pictures corresponding to a target article;
acquiring a plurality of target images corresponding to the target object based on the target head image and the plurality of target main images;
inputting the multiple target images into an image classification model to obtain a classification result corresponding to the multiple target images, wherein the image classification model is obtained based on the method of any one of the above claims 1 to 6.
8. The method of claim 7, wherein the classification result comprises a confidence level corresponding to a qualified main graph; and, the method further comprises:
judging whether the confidence corresponding to the qualified main graph is larger than the preset threshold or not according to the classification result corresponding to each target main graph;
in response to determining that the confidence corresponding to the qualified main graph is greater than the preset threshold, determining that the target main graph is a qualified target main graph;
and in response to determining that the confidence corresponding to the qualified main graph is not greater than the preset threshold, determining the target main graph as an unqualified target main graph.
9. A model training apparatus, the apparatus comprising:
the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is configured to acquire a training sample set, the training sample set comprises a sample image corresponding to a sample article and a sample classification result of the sample image, and the sample image is composed of a sample head image corresponding to the sample article and a sample main image;
a construction module configured to construct an initial model comprising a residual neural network and a classification network;
and the training module is configured to utilize a machine learning method to take the sample image as the input of the residual error neural network, obtain a first feature vector corresponding to the sample head graph and a second feature vector corresponding to the sample main graph, take the first feature vector and the second feature vector output by the residual error neural network as the input of the classification network, take the sample classification result of the sample image as the expected output, train the initial model, and obtain an image classification model.
10. The apparatus of claim 9, wherein the acquisition module is further configured to:
aiming at a plurality of sample articles, obtaining a sample head graph and a plurality of sample main graphs corresponding to each sample article;
classifying a plurality of sample main graphs of each sample article to obtain a sample classification result of the plurality of sample main graphs;
respectively carrying out image splicing on the sample head image of each sample article and each corresponding sample main image to obtain a plurality of sample images consisting of the sample head image and the sample main images;
and acquiring the training sample set based on a plurality of sample images corresponding to the sample items and a sample classification result of a sample main graph in each sample image.
11. An image classification apparatus, the apparatus comprising:
the acquisition module is configured to acquire a target first image and a plurality of target main images corresponding to a target article; acquiring a plurality of target images corresponding to the target object based on the target head graph and the plurality of target main graphs;
a classification module configured to input the multiple target images into an image classification model, and obtain a classification result corresponding to the multiple target images, wherein the image classification model is obtained based on the method of any one of the preceding claims 1 to 6.
12. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-8.
13. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-8.
CN202211151337.3A 2022-09-21 2022-09-21 Model training method, image classification method and device Pending CN115482415A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202211151337.3A CN115482415A (en) 2022-09-21 2022-09-21 Model training method, image classification method and device
PCT/CN2023/120364 WO2024061311A1 (en) 2022-09-21 2023-09-21 Model training method and apparatus, and image classification method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211151337.3A CN115482415A (en) 2022-09-21 2022-09-21 Model training method, image classification method and device

Publications (1)

Publication Number Publication Date
CN115482415A true CN115482415A (en) 2022-12-16

Family

ID=84392717

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211151337.3A Pending CN115482415A (en) 2022-09-21 2022-09-21 Model training method, image classification method and device

Country Status (2)

Country Link
CN (1) CN115482415A (en)
WO (1) WO2024061311A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024061311A1 (en) * 2022-09-21 2024-03-28 北京沃东天骏信息技术有限公司 Model training method and apparatus, and image classification method and apparatus

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103092861B (en) * 2011-11-02 2016-01-06 阿里巴巴集团控股有限公司 A kind of choosing method of commodity representative picture and system
CN110858276A (en) * 2018-08-22 2020-03-03 北京航天长峰科技工业集团有限公司 Pedestrian re-identification method combining identification model and verification model
CN110689046A (en) * 2019-08-26 2020-01-14 深圳壹账通智能科技有限公司 Image recognition method, image recognition device, computer device, and storage medium
KR102250996B1 (en) * 2019-10-02 2021-05-12 주식회사 지앤아이티글로벌 Method for classifying stratum using neural network model and device for the same method
CN115482415A (en) * 2022-09-21 2022-12-16 北京沃东天骏信息技术有限公司 Model training method, image classification method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024061311A1 (en) * 2022-09-21 2024-03-28 北京沃东天骏信息技术有限公司 Model training method and apparatus, and image classification method and apparatus

Also Published As

Publication number Publication date
WO2024061311A1 (en) 2024-03-28

Similar Documents

Publication Publication Date Title
CN110288082B (en) Convolutional neural network model training method and device and computer readable storage medium
CN110378410B (en) Multi-label scene classification method and device and electronic equipment
WO2020107624A1 (en) Information pushing method and apparatus, electronic device and computer-readable storage medium
CN111666898B (en) Method and device for identifying class to which vehicle belongs
CN110674349B (en) Video POI (Point of interest) identification method and device and electronic equipment
CN112153460B (en) Video dubbing method and device, electronic equipment and storage medium
CN112149699B (en) Method and device for generating model and method and device for identifying image
CN112232311B (en) Face tracking method and device and electronic equipment
CN110287816B (en) Vehicle door motion detection method, device and computer readable storage medium
CN112329762A (en) Image processing method, model training method, device, computer device and medium
CN112100558A (en) Method, device, equipment and storage medium for object recommendation
WO2024061311A1 (en) Model training method and apparatus, and image classification method and apparatus
CN110069997B (en) Scene classification method and device and electronic equipment
CN113033707B (en) Video classification method and device, readable medium and electronic equipment
CN111291715A (en) Vehicle type identification method based on multi-scale convolutional neural network, electronic device and storage medium
WO2023130925A1 (en) Font recognition method and apparatus, readable medium, and electronic device
CN111832354A (en) Target object age identification method and device and electronic equipment
CN114422698B (en) Video generation method, device, equipment and storage medium
CN113837808B (en) Promotion information pushing method, device, equipment, medium and product
CN113780534B (en) Compression method, image generation method, device, equipment and medium of network model
CN113255819B (en) Method and device for identifying information
CN112862538A (en) Method, apparatus, electronic device, and medium for predicting user preference
CN112347278A (en) Method and apparatus for training a characterization model
CN112634469A (en) Method and apparatus for processing image
CN111950572A (en) Method, apparatus, electronic device and computer-readable storage medium for training classifier

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination