CN114830186A

CN114830186A - Image classification method and device, storage medium and electronic equipment

Info

Publication number: CN114830186A
Application number: CN202080087887.6A
Authority: CN
Inventors: 高洪涛
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd; Shenzhen Huantai Technology Co Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd; Shenzhen Huantai Technology Co Ltd
Priority date: 2020-01-10
Filing date: 2020-01-10
Publication date: 2022-07-29
Also published as: WO2021138911A1

Abstract

An image classification method, an image classification device, a storage medium and an electronic device are provided. The method comprises the steps of determining a target image (101) needing image classification, calling a pre-trained fine-grained classification model to carry out fine-grained classification on the target image (102), and obtaining a fine-grained category (103) of the target image output by the fine-grained classification model.

Description

Image classification method and device, storage medium and electronic equipment

Technical Field

The embodiment of the application relates to the technical field of image recognition, in particular to an image classification method, an image classification device, a storage medium and electronic equipment.

Background

The image classification mainly includes coarse-grained image classification and fine-grained image classification, and the fine-grained image classification is also called subcategory image classification, and aims to perform more detailed subcategory classification on coarse-grained large categories, such as distinguishing the types of birds, the styles of vehicles, the types of dogs, and the like.

Disclosure of Invention

The application provides an image classification method, an image classification device, a storage medium and electronic equipment, which can realize fine-grained classification of images.

In a first aspect, the present application provides an image classification method, including:

determining a target image needing image classification;

calling a pre-trained fine-grained classification model to perform fine-grained classification on the target image, wherein the fine-grained classification model comprises a feature extraction module, a feature optimization module and a fine-grained classification module, the feature extraction module is used for extracting image features of the target image, the feature optimization module is used for performing optimization processing on the image features to obtain optimized image features, and the fine-grained classification module performs fine-grained classification on the optimized image features to obtain fine-grained classes of the target image;

and obtaining the fine-grained classification of the target image output by the fine-grained classification model.

In a second aspect, the present application further provides an image classification apparatus, including:

the image determining component is used for determining a target image which needs to be subjected to image classification;

the model calling component is used for calling a pre-trained fine-grained classification model to perform fine-grained classification on the target image, wherein the fine-grained classification model comprises a feature extraction module, a feature optimization module and a fine-grained classification module, the feature extraction module is used for extracting image features of the target image, the feature optimization module is used for performing optimization processing on the image features to obtain optimized image features, and the fine-grained classification module performs fine-grained classification on the optimized image features to obtain fine-grained classes of the target image;

and the category acquisition component is used for acquiring the fine-grained category of the target image output by the fine-grained classification model.

In a third aspect, the present application also provides a storage medium having a computer program stored thereon, wherein the computer program, when executed on a computer, causes the computer to perform:

determining a target image needing image classification;

In a fourth aspect, the present application further provides an electronic device, including a memory and a processor, where the memory stores a computer program, and the processor is configured to execute, by calling the computer program stored in the memory:

determining a target image needing image classification;

Drawings

The technical solutions and advantages of the present application will become apparent from the following detailed description of specific embodiments of the present application when taken in conjunction with the accompanying drawings.

Fig. 1 is a schematic flow chart of an image classification method according to an embodiment of the present application.

Fig. 2 is an exemplary diagram of triggering image classification in the embodiment of the present application.

Fig. 3 is an exemplary diagram of an image classification interface provided in an embodiment of the present application.

FIG. 4 is an exemplary diagram of a selection sub-interface provided in an embodiment of the present application.

Fig. 5 is an architecture diagram of a fine-grained classification model according to an embodiment of the present disclosure.

Fig. 6 is an exemplary diagram of a target image provided in an embodiment of the present application.

Fig. 7 is another architecture diagram of a fine-grained classification model provided in an embodiment of the present application.

Fig. 8 is a schematic diagram of an architecture of a machine learning network according to an embodiment of the present application.

Fig. 9 is another schematic flowchart of an image classification method according to an embodiment of the present application.

Fig. 10 is a schematic structural diagram of an image classification apparatus according to an embodiment of the present application.

Fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Referring to the drawings, wherein like reference numbers refer to like elements, the principles of the present application are illustrated as being implemented in a suitable computing environment. The following description is based on illustrated embodiments of the application and should not be taken as limiting the application with respect to other embodiments that are not detailed herein.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence base technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Machine Learning (ML) is a multi-domain cross subject, and relates to multiple subjects such as probability theory, statistics, approximation theory, convex analysis and algorithm complexity theory. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach to make computers intelligent, and is applied throughout various fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and the like.

The technical scheme provided by the embodiment of the application relates to an artificial intelligence machine learning technology, and is specifically explained by the following embodiment:

the embodiment of the application provides an image classification method, an image classification device, a storage medium and an electronic device, wherein an execution subject of the image classification method can be the image classification device provided in the embodiment of the application or the electronic device integrated with the image classification device, and the image classification device can be implemented in a hardware or software mode. The electronic device may be a device such as a smart phone, a tablet computer, a palm computer, a notebook computer, or a desktop computer, which is equipped with a processor (including but not limited to a general-purpose processor, a customized processor, etc.) and has processing capability.

Referring to fig. 1, fig. 1 is a flow chart illustrating an image classification method according to an embodiment of the present disclosure. The flow of the image classification method can comprise the following steps:

in 101, a target image for which image classification is required is determined.

In this embodiment of the application, the electronic device may determine, based on a preset image classification period and according to a preset image selection rule, a target image that needs to be subjected to image classification, or determine, according to an image classification operation input by a user, a target image that needs to be subjected to image classification when an image classification instruction input by the user is received, and so on.

It should be noted that, in the embodiment of the present application, no specific limitation is imposed on the setting of the image classification period, the image selection rule, and the image classification instruction, and the setting may be performed by the electronic device according to the input of the user, or the default setting may be performed by the manufacturer of the electronic device, and so on.

For example, assuming that the image classification period is pre-configured as a natural week with monday as a starting point, and the image selection rule is configured as "selecting a shot image for image classification", the electronic device may automatically trigger image classification every monday, and determine the shot image as a target image to be subjected to image classification.

For another example, referring to fig. 2, the electronic device provides a "sort" control for triggering image sorting in an image browsing interface. Where the illustrated rectangles represent different images, the circular boxes in the rectangles represent "select" controls for selecting the corresponding image. The user may click on the selection control corresponding to an image to select the image, and may click on the selection control corresponding to the image again to deselect the image. As shown in fig. 2, after a user selects an image that needs to be classified, an image classification instruction is input to the electronic device by clicking a classification control, where the image classification instruction carries indication information indicating the image selected by the user. Correspondingly, the electronic equipment determines the image selected by the user as the target image needing image classification according to the indication information in the image classification instruction input by the user.

For another example, the electronic device may receive an input image classification instruction through an image classification interface including a request input interface, as shown in fig. 3, the request input interface may be in the form of an input box, and a user may enter identification information of an image to be subjected to image classification in the request input interface in the form of the input box and input confirmation information (e.g., directly pressing an enter key of a keyboard) to input the image classification instruction, where the image classification instruction carries the identification information of the image to be subjected to image classification. Correspondingly, the electronic equipment can determine the target image needing image classification according to the identification information in the received image classification instruction.

For another example, the image classification interface shown in fig. 3 further includes an "open" control, on one hand, when the electronic device detects that the open control is triggered, a selection sub-interface (as shown in fig. 4) is displayed on the image classification interface in an overlapping manner, and the selection sub-interface provides thumbnails of images that can be subjected to image classification, such as thumbnails of images a, B, C, D, E, F, and the like, for the user to search for and select thumbnails of images that need to be subjected to image classification; on the other hand, after selecting the thumbnail of the image needing image classification, the user can trigger a confirmation control provided by the selection sub-interface to input an image classification instruction to the electronic device, wherein the image classification instruction is associated with the thumbnail of the image selected by the user and instructs the electronic device to use the image selected by the user as a target image needing image classification.

In addition, a person skilled in the art may set other specific implementation manners of the input image classification instruction according to actual needs, and the present invention is not limited to this specific implementation manner.

In 102, a pre-trained fine-grained classification model is called to perform fine-grained classification on the target image, wherein the fine-grained classification model comprises a feature extraction module, a feature optimization module and a fine-grained classification module, the feature extraction module is used for extracting image features of the target image, the feature optimization module is used for performing optimization processing on the image features to obtain optimized image features, and the fine-grained classification module performs fine-grained classification on the optimized image features to obtain fine-grained classes of the target image.

It should be noted that, in the present application, a machine learning method is adopted in advance to train a fine-grained classification model configured to perform fine-grained classification on an image.

Referring to fig. 5, the fine-grained classification model provided in the present application includes two parts, namely, a feature extraction module for extracting features, a feature optimization module for optimizing the features, and a fine-grained classification module for performing fine-grained classification according to the features.

The electronic equipment firstly inputs a target image into the feature extraction module, and performs feature extraction on the target image based on the feature extraction module, so as to obtain the image features of the target image.

Illustratively, the feature extraction module comprises N layers of convolution layers, when the feature extraction module is used for extracting the features of the target image, the electronic equipment firstly performs convolution calculation on the target image based on the 1 st layer of convolution layer to obtain a feature map, and the feature map is marked as a 1-layer feature map; then, the electronic equipment performs convolution calculation on the layer 1 characteristic diagram based on the layer 2 convolution layer to obtain a new characteristic diagram which is marked as a layer 2 characteristic diagram; and repeating the steps until the N layer of feature map is obtained by calculation based on the Nth layer of convolution layer, and taking the N layer of feature map as the image feature obtained by feature extraction of the target image.

It should be noted that, in the present application, the value of N, that is, the number of convolutional layers constituting the feature extraction module, is not specifically limited, and may be set by a person of ordinary skill in the art according to actual needs, for example, the value of N in the embodiment of the present application is 9.

After the electronic device extracts the image features of the target image based on the feature extraction module, the electronic device further performs optimization processing on the image features based on the feature optimization module according to a preset optimization strategy to obtain optimized image features, and the optimized image features are recorded as optimized image features.

After the electronic equipment optimizes the extracted image features based on the feature optimization module to obtain the optimized image features, the electronic equipment further performs fine-grained classification on the optimized image features based on the fine-grained classification module to obtain the fine-grained classification of the target image. Wherein, the fine-grained classification module can be a full connection layer.

For example, referring to fig. 6, when the left image in fig. 6 is determined as the target image, it is known that the coarse-grained category is "dog", and the fine-grained classification module performs fine-grained classification on the optimized image feature corresponding to the target image to obtain that the fine-grained category is "corbel", that is, the variety characterizing the dog in the target image is corbel; when the image on the right side of fig. 6 is determined as the target image, it is known that the coarse-grained type is also "dog", and the fine-grained classification module performs fine-grained classification on the optimized image features corresponding to the target image to obtain that the fine-grained type is "husky", that is, the variety representing the dog in the target image is husky.

In 103, a fine-grained classification of the target image output by the fine-grained classification model is obtained.

In the embodiment of the application, after the fine-grained classification model finishes the fine-grained classification of the target image, the electronic equipment can obtain the fine-grained classification obtained by classification from the fine-grained classification module of the fine-grained classification model.

According to the method, the target image needing image classification is determined, and a pre-trained fine-grained classification model is called to perform fine-grained classification on the target image, wherein the fine-grained classification model comprises a feature extraction module, a feature optimization module and a fine-grained classification module, the feature extraction module is used for extracting image features of the target image, the feature optimization module is used for performing optimization processing on the image features to obtain optimized image features, the fine-grained classification module performs fine-grained classification on the optimized image features to obtain fine-grained classes of the target image, and then the fine-grained classes of the target image output by the fine-grained classification model are obtained. Therefore, fine-grained classification is not needed manually, and fine-grained classification of the image can be efficiently realized.

In one embodiment, determining a target image for image classification includes:

and when the image classification period is reached, determining the newly added image in the image classification period as the target image.

In the embodiment of the application, when the electronic device reaches an image classification period, the electronic device triggers and determines a target image which needs to be subjected to image classification. The electronic device can directly take the newly added image in the image classification period as the target image. For example, in an image classification cycle, 20 additional images are added to the electronic device, and the electronic device takes the 20 additional images as target images to be subjected to image classification.

In one embodiment, the "determining a target image to be subjected to image classification" includes:

(1) determining the image under the preset storage path as a target image; alternatively, the first and second electrodes may be,

(2) determining an image in a preset image format as a target image; alternatively, the first and second electrodes may be,

(3) and determining the image in the preset image format under the preset storage path as the target image.

The setting of the preset storage path and the preset image format is not specifically limited in the embodiment of the application, and the setting can be performed by the electronic device according to the input of a user, or the default setting can be performed by a manufacturer of the electronic device. It should be noted that the preset storage path may be configured as one or multiple paths, and correspondingly, the preset image format may be configured as one or multiple paths.

For example, assuming that a user needs the electronic device to classify captured images, the preset storage path may be configured as a storage path of images captured by the electronic device, and for example, if the electronic device is based on an android system, the preset storage path is configured as "/storage/0/DCIM", so that the electronic device determines all images in a file directory "DCIM" corresponding to the storage/0/DCIM as target images that need to be subjected to image classification.

For another example, assuming that the user needs the electronic device to classify images in a certain image format, the preset image format may be configured as the image format specified by the user, and for example, if the user needs the electronic device to classify images in the "JPG" format, the preset image format is configured as the "JPG" format, so that the electronic device determines all local images in the "JPG" format as target images that need to be subjected to image classification.

For another example, assuming that the user needs the electronic device to classify the captured images in a certain image format, the preset storage path may be configured as a storage path of the images captured by the electronic device, and the preset image format is configured as an image format specified by the user, for example, if the electronic device is based on the android system, the preset storage path is configured as "/storage/0/DCIM", and if the user needs the captured images in the "JPG" format obtained by the electronic device to be classified, the preset image format is configured as "JPG" format, so that the electronic device determines all the images in the "JPG" format in the file directory "DCIM" corresponding to the storage/0/DCIM as the target images that need to be subjected to image classification.

In an embodiment, the fine-grained classification model further includes a dimension reduction module, configured to perform feature dimension reduction on the optimized image features to obtain the optimized image features after the dimension reduction;

and the fine-grained classification module is also used for classifying and predicting the optimized image features after dimension reduction to obtain the fine-grained classification of the target image.

Referring to fig. 7, the fine-grained classification model provided by the present application further includes a dimension reduction module, which may be a pooling layer. As shown in fig. 7, one end of the dimension reduction module is connected to the feature optimization module, and the other end is connected to the fine-grained classification module.

In the embodiment of the application, after the electronic device performs optimization processing on the image extracted by the feature extraction module based on the feature optimization module, the electronic device does not directly perform fine-grained classification according to the optimized image features, but inputs the optimized image features into the dimension reduction module, so that feature dimension reduction processing is performed on the optimized image features based on the dimension reduction module to obtain the optimized image features after dimension reduction. And then, the electronic equipment inputs the optimized image features subjected to dimension reduction into a fine-grained classification module for fine-grained classification, and accordingly fine-grained classes of the target image are obtained.

In an embodiment, after obtaining the fine-grained classification of the target image output by the fine-grained classification model, the method further includes:

and distributing a storage path for the target image according to the fine-grained category, and storing the target image into the storage path.

In the embodiment of the application, in order to facilitate the user to browse the image, the electronic device further classifies and stores the target image according to a fine-grained category obtained by performing fine-grained classification on the target image.

The electronic device may allocate a storage path to each fine-grained category, and store the corresponding target image in the allocated storage path. For example, if the target images are classified into nine categories, the electronic device correspondingly allocates nine different storage paths for storing the target images of the corresponding categories.

In one embodiment, after storing the target image into the distribution storage path, the method further includes:

for the target images in each storage path, acquiring browsing behavior data of each target image browsed by a user, and acquiring the creation duration of each target image;

carrying out weighted summation on the browsing behavior data and the creation duration of each target image to obtain a weighted sum value of each target image;

and sequencing the target images according to the weighted sum of the target images.

In the embodiment of the present application, after the target images are classified, the target images of each class (i.e., the target images under each storage path) are also sorted.

The browsing behavior data includes related data describing the browsing behavior of the user, for example, the browsing behavior data includes the number of times the user browses the target image, and the opening time and the closing time of each time the user browses the target image, and the like.

The electronic equipment acquires the browsing behavior data of each target image browsed by the user and acquires the creation time of each target image. The creating time length is the difference value between the current time and the generation time of the target image.

It should be noted that the current time is not specific to a certain time, but refers to a time when the electronic device performs an operation of "acquiring a creation time of each target image". In addition, the generation manner of the target image is not specifically limited in the embodiment of the present application, for example, if a certain target image is generated by the electronic device in a shooting manner, the generation time of the target image is the shooting time of the electronic device to obtain the target image; for another example, if a certain target image is generated by the electronic device through internet download, the generation time of the target image is the download time of the electronic device through internet download to obtain the target image, and so on.

In the embodiment of the application, after the browsing behavior data and the creation duration of each target image are acquired, the electronic device performs weighted summation on the acquired browsing behavior data and the creation duration according to a preset weighted summation algorithm to obtain a weighted sum value corresponding to each target image.

The browsing behavior data can reflect the characteristics of the browsing behavior of the user, the creating time length is the characteristics of the image, and the electronic equipment performs weighted summation on the acquired browsing behavior data and the creating time length, and aims to: the target image is comprehensively evaluated by combining the self characteristics of the target image and the user characteristics except the image, so that a weighted sum value is obtained by weighted summation, namely a 'score' obtained by comprehensively evaluating the target image, and the grade reflects the probability that the target image is possibly browsed by a user.

In the embodiment of the application, after the electronic device obtains the weighted sum of each target image, the electronic device sorts the target images according to the order of the weighted sum from large to small.

In an embodiment, the weighted summation of the browsing behavior data and the creation duration of each target image to obtain a weighted sum value of each target image includes:

acquiring the browsing times of each target image and the browsing duration of each browsing according to the browsing behavior data of each target image;

acquiring the average browsing duration of each target image according to the browsing times of each target image and the browsing duration of each browsing;

carrying out normalization processing on the browsing times, the average browsing time length and the creating time length of each target image;

and carrying out weighted summation on the normalized browsing times, the average browsing time length and the creation time length of each target image to obtain a weighted sum value of each target image.

In the embodiment of the application, when a target image is browsed by a user, the electronic device records browsing behavior data of the user browsing the target image, wherein the browsing behavior data includes, but is not limited to, the number of times the user browses the target image, and the opening time and the closing time of each time the user browses the target image, and the like.

Therefore, when the electronic device performs weighted summation on the browsing behavior data and the creation time of each target image, the browsing times of each target image (namely the times of the user browsing the target image) can be directly extracted from the browsing behavior data of each target image, and the browsing time of each target image in each browsing process can be obtained according to the opening time and the closing time of each target image browsed by the user in the browsing behavior data of each target image.

After the electronic device obtains the browsing times of each target image and the browsing duration of each browsing, the average browsing duration of each target image is calculated according to the browsing times of each target image and the browsing duration of each browsing. It should be noted that, as will be understood by those skilled in the art, the average browsing time period referred to herein is the average browsing time period of a single target image, not the average browsing time periods of a plurality of target images.

In addition, in the embodiment of the application, the three data, that is, the browsing times, the average browsing duration and the creation duration, are respectively pre-assigned with corresponding weight values, but values of the respective corresponding weight values of the browsing times, the average browsing duration and the creation duration are not specifically limited, and can be set by a person skilled in the art according to actual needs. For example, the weight value corresponding to the browsing times may be set to 0.3, the weight value corresponding to the average browsing duration may be set to 0.2, and the weight value corresponding to the creation duration may be set to 0.5.

In order to improve the efficiency of weighted summation, when the electronic device performs weighted summation on the browsing times, the average browsing duration and the creation duration of each target image, firstly, normalization processing is performed on the browsing times, the average browsing duration and the creation duration of each target image, and the browsing times, the average browsing duration and the creation duration of each target image are normalized to be within the same numerical value interval.

And then, the electronic equipment performs weighted summation on the normalized browsing times, the average browsing time and the creation time of each target image according to a preset weighted summation algorithm to obtain a weighted sum value corresponding to each target image.

In an embodiment, the image feature includes a feature map, and the feature optimization module is configured to perform a transposition process on the feature map to obtain a transposed feature map, perform a matrix multiplication process on the feature map and the transposed feature map, and use a result of the matrix multiplication as an optimized image feature.

By taking the image characteristics as the characteristic diagram as an example, the application provides a mode for optimizing the image characteristics.

The electronic device transposes the extracted image features, that is, the feature map of the target image, based on the feature optimization module, to obtain a transposed feature map, which is recorded as a transposed feature map. Then, the electronic device further performs matrix multiplication processing on the original feature map and the transposed feature map based on the feature optimization module, and takes the result of the matrix multiplication as the optimized image feature for fine-grained classification.

In one embodiment, before determining the target image that needs to be subjected to image classification, the method further includes:

(1) obtaining a plurality of sample images, and obtaining fine-grained class labels and coarse-grained class labels of the sample images;

(2) constructing a machine learning network, wherein the machine learning network comprises a first branch network, a second branch network and two classification modules with the same structure, the first branch network comprises a feature extraction module, a feature optimization module and a fine-grained classification module, and the two classification modules are connected with the feature extraction module of the first branch network and the feature extraction module of the second branch network;

(3) selecting a first sample image from the plurality of sample images, extracting image features of the first sample image based on a feature extraction module of a first branch network, and inputting the first sample image into a fine-grained classification module of the first branch network for fine-grained classification after the first sample image is optimized by a feature optimization module of the first branch network to obtain a first predicted fine-grained category;

(4) selecting a second sample image from the plurality of sample images, extracting image characteristics of the second sample image based on a characteristic extraction module of a second branch network, optimizing the second sample image by a characteristic optimization module of the second branch network, and inputting the second sample image into a fine-grained classification module of the second branch network for fine-grained classification to obtain a second predicted fine-grained category;

(5) fusing the image characteristics of the first sample image and the image characteristics of the second sample image to obtain fused image characteristics, and predicting whether the coarse grain categories of the first sample image and the second sample image are the same or not based on a classification module to obtain a prediction result;

(6) obtaining a first classification loss of a first branch network according to a first predicted fine-grained category and a fine-grained category label of a first sample image, obtaining a second classification loss of a second branch network according to a second predicted fine-grained category and a fine-grained category label of a second sample image, and obtaining a third classification loss according to a prediction result, a coarse-grained category of the first sample image and the second sample image;

(7) acquiring corresponding total loss according to the first classification loss, the second classification loss and the third classification loss, and adjusting parameters of the first branch network and the second branch network according to the total loss until a preset training stop condition is met to finish training;

(8) and selecting one branch network from the first branch network and the second branch network as a fine-grained classification model.

The application also provides an alternative way of training a fine-grained classification model.

The electronic equipment firstly obtains a plurality of sample images, and obtains a fine-grained class label and a coarse-grained class label of each sample image. The fine-granularity category label is used for describing a fine-granularity category of the sample image, and the coarse-granularity label is used for describing a coarse-granularity category of the sample image.

It should be noted that the present application is not limited to the manner and number of the sample images, and the sample images can be configured by those skilled in the art according to the actual needs. For example, the electronic device may crawl images from the internet as sample images and receive manually labeled fine-grained category labels as well as coarse-grained category labels of the sample images.

In addition, referring to fig. 8, the machine learning network includes two branch networks and two classification modules with the same structure, where each branch network includes a feature extraction module, a feature optimization module, and a fine-grained classification module, and for convenience of distinction, one of the branch networks is denoted as a first branch network, and the other branch network is denoted as a second branch network. Furthermore, the two classification modules are connected to the feature extraction modules of the first and second branch networks.

After the machine learning network is constructed, the electronic device trains the machine learning network by using the acquired sample images.

The electronic equipment selects two sample images from the acquired multiple sample images, one sample image is recorded as a first sample image, and the other sample image is recorded as a second sample image. For the first sample image, the electronic equipment extracts the image features of the first sample image based on the feature extraction module of the first branch network, inputs the image features into the fine-grained classification module of the first branch network for fine-grained classification after the image features are optimized by the feature optimization module of the first branch network, and obtains a first predicted fine-grained category. For the second sample image, the electronic device extracts the image features of the second sample image based on the feature extraction module of the second branch network, and inputs the image features into the fine-grained classification module of the second branch network for fine-grained classification after the image features are optimized by the feature optimization module of the second branch network, so as to obtain a second predicted fine-grained category.

In addition, the electronic equipment further fuses the image features of the first sample image and the image features of the second sample image to obtain fused image features, and predicts whether the coarse-grained categories of the first sample image and the second sample image are the same or not according to the fused image features based on the two classification modules to obtain a prediction result.

In addition, the electronic device further obtains a first classification loss of the first branch network according to the first predicted fine-grained category and the fine-grained category label of the first sample image, obtains a second classification loss of the second branch network according to the second predicted fine-grained category and the fine-grained category label of the second sample image, and obtains a third classification loss according to the prediction result, the coarse-grained category of the first sample image and the coarse-grained category of the second sample image. The loss function for obtaining the aforementioned first classification loss, second classification loss, and third classification loss may be configured by a person having ordinary skill in the art according to actual needs, and the present application is not limited in this respect.

Then, the electronic device obtains a corresponding total loss according to the first classification loss, the second classification loss, and the third classification loss, which may be expressed as:

L _total ＝Loss1+Loss2+Loss3；

wherein L is _total Represents total Loss, Loss1 represents first classification Loss, Loss2 represents second classification Loss, and Loss3 represents third classification Loss.

After obtaining the total loss, the electronic device adjusts parameters of the first branch network and the second branch network according to the total loss. It should be noted that the goal of model training is to minimize the total loss, so that after each determination of the total loss, the parameters of the first branch network and the second branch network can be adjusted in the direction of minimizing the total loss.

As above, the training is ended by continuously adjusting the parameters of the first branch network and the second branch network until the preset training stop condition is met. The preset training stopping condition may be set by a person of ordinary skill in the art according to actual needs, and the embodiment of the present application does not specifically limit this.

For example, the preset training stop condition is configured to: stopping training when the total loss takes the minimum value;

for another example, the preset training stop condition is configured to: and stopping training when the iteration times of the parameters reach the preset times.

When the preset training stopping condition is met, the electronic equipment judges that the first branch network and the second branch network in the machine learning network can accurately carry out fine-grained classification on the image, and at the moment, one branch network is selected from the first branch network and the second branch network to serve as a fine-grained classification model for carrying out fine-grained classification on the image.

It should be noted that, the present application is not limited specifically to how to select the fine-grained classification model from the first branch network and the second branch network, for example, in this embodiment of the present application, the electronic device may randomly select one branch network from the first branch network and the second branch network as the fine-grained classification model.

In one embodiment, the third classification loss is obtained according to the following formula:

Loss3＝-[η*y*log(p)+(1-y)*log(1-p)]；

wherein Loss3 represents a third classification Loss; eta represents a correction coefficient (an empirical value, such as a value between 0.3 and 0.5 in the present application); y is used for representing whether the coarse-grained type label of the first sample image is the same as the coarse-grained type label of the second sample image, and the value can be taken by a person skilled in the art according to actual needs; p represents the prediction result.

In the embodiment of the present application, the purpose of increasing the correction coefficient η is to reduce the contribution of y × log (p) to the total loss when the coarse-grained class labels of the first sample image and the second sample image are different, so that the first branch network and the second branch network understand that the first sample image and the second sample image are similar in a certain dimension.

In one embodiment, selecting one of the branch networks from the first branch network and the second branch network as the fine-grained classification model comprises:

obtaining the classification accuracy of a first branch network and obtaining the classification accuracy of a second branch network;

and selecting a branch network with higher classification accuracy from the first branch network and the second branch network as a fine-grained classification model.

The application provides a mode for selecting a fine-grained classification model, wherein electronic equipment respectively obtains the classification accuracy of a first branch network and the classification accuracy of a second branch network, and then selects a branch network with higher classification accuracy from the first branch network and the second branch network as the fine-grained classification model.

In an embodiment, fusing the image features of the first sample image and the image features of the second sample image to obtain fused image features, includes:

and carrying out channel combination on the image characteristics of the first sample image and the image characteristics of the second sample image, and taking the result of the channel combination as the fusion image characteristics.

For example, the electronic device may perform channel merging on the image features of the first sample image and the image features of the second sample image in a Concat manner, and use a result of the channel merging as the fused image feature.

Referring to fig. 9, the present application further provides a model training method, where the flow of the model training method may be:

in 201, a plurality of sample images are acquired, and a fine-grained class label and a coarse-grained class label of the sample images are acquired.

It should be noted that the present application is not limited to the manner and number of the sample images, and the sample images can be configured by those skilled in the art according to the actual needs. For example, referring to fig. 10, a portion of the sample image may be obtained from the ImageNet dataset in advance and stored in the electronic device, a portion of the sample image may be crawled from the network and stored in the electronic device, and a portion of the sample image may be manually collected from the network.

After the sample image is obtained, the electronic device further receives a fine-grained class label and a coarse-grained class label of the artificially labeled sample image. The fine-granularity category label is used for describing a fine-granularity category of the sample image, and the coarse-granularity label is used for describing a coarse-granularity category of the sample image.

In 202, a machine learning network is constructed, where the machine learning network includes a first branch network, a second branch network, and two classification modules with the same structure, the first branch network includes a feature extraction module, a feature optimization module, and a fine-grained classification module, and the two classification modules are connected to the feature extraction module of the first branch network and the feature extraction module of the second branch network.

Referring to fig. 8, the machine learning network constructed includes two branch networks with the same structure and two classification modules, each branch network includes a feature extraction module, a feature optimization module and a fine-grained classification module, and for convenience of distinction, one of the branch networks is denoted as a first branch network, and the other branch network is denoted as a second branch network. Furthermore, the two classification modules are connected to the feature extraction modules of the first and second branch networks.

In 203, a first sample image is selected from the plurality of sample images, image features of the first sample image are extracted based on a feature extraction module of the first branch network, and the first sample image is optimized by a feature optimization module of the first branch network and then input to a fine-grained classification module of the first branch network for fine-grained classification, so that a first predicted fine-grained category is obtained.

In 204, a second sample image is selected from the plurality of sample images, the image features of the second sample image are extracted based on a feature extraction module of a second branch network, and the second sample image is optimized by a feature optimization module of the second branch network and then input to a fine-grained classification module of the second branch network for fine-grained classification, so that a second predicted fine-grained category is obtained.

In 205, the image features of the first sample image and the image features of the second sample image are fused to obtain fused image features, and whether the coarse-grained categories of the first sample image and the second sample image are the same is predicted based on the two classification modules according to the fused image features, so as to obtain a prediction result.

At 206, a first classification loss of the first branch network is obtained according to the first predicted fine-grained classification and the fine-grained classification label of the first sample image, a second classification loss of the second branch network is obtained according to the second predicted fine-grained classification and the fine-grained classification label of the second sample image, and a third classification loss is obtained according to the prediction result, the coarse-grained classification of the first sample image and the second sample image.

The loss function for obtaining the aforementioned first classification loss, second classification loss, and third classification loss may be configured by a person having ordinary skill in the art according to actual needs, and the present application is not limited in this respect.

For example, the third classification loss is obtained according to the following formula:

Loss3＝-[η*y*log(p)+(1-y)*log(1-p)]；

In 207, the corresponding total loss is obtained according to the first classification loss, the second classification loss and the third classification loss, and the parameters of the first branch network and the second branch network are adjusted according to the total loss until the preset training stop condition is met, and the training is ended.

For example, the total loss is obtained according to the following formula:

L _total ＝Loss1+Loss2+Loss3；

At 208, a branch network is selected from the first branch network and the second branch network as a fine-grained classification model.

Referring to fig. 10, fig. 10 is a schematic structural diagram of an image classification apparatus provided in the present application. The image classification apparatus may include: an image determination component 301, a model invocation component 302, and a category acquisition component 303.

An image determining component 301, configured to determine a target image that needs to be subjected to image classification;

the model calling component 302 is used for calling a pre-trained fine-grained classification model to perform fine-grained classification on a target image, wherein the fine-grained classification model comprises a feature extraction module, a feature optimization module and a fine-grained classification module, the feature extraction module is used for extracting image features of the target image, the feature optimization module is used for performing optimization processing on the image features to obtain optimized image features, and the fine-grained classification module performs fine-grained classification on the optimized image features to obtain fine-grained classes of the target image;

the category obtaining component 303 is configured to obtain a fine-grained category of the target image output by the fine-grained classification model.

In one embodiment, in determining a target image for which image classification is required, the image determination component 301 is configured to:

In an embodiment, the image classification apparatus provided by the present application further includes a classification storage component, configured to, after obtaining a fine-grained classification of a target image output by the fine-grained classification model, allocate a storage path to the target image according to the fine-grained classification, and store the target image in the storage path.

In an embodiment, the image classification apparatus provided by the present application further includes a model training component, before determining a target image that needs to be subjected to image classification, configured to:

obtaining a plurality of sample images, and obtaining fine-grained class labels and coarse-grained class labels of the sample images;

constructing a machine learning network, wherein the machine learning network comprises a first branch network, a second branch network and two classification modules with the same structure, the first branch network comprises a feature extraction module, a feature optimization module and a fine-grained classification module, and the two classification modules are connected with the feature extraction module of the first branch network and the feature extraction module of the second branch network;

selecting a first sample image from the plurality of sample images, extracting image features of the first sample image based on a feature extraction module of a first branch network, and inputting the first sample image into a fine-grained classification module of the first branch network for fine-grained classification after the first sample image is optimized by a feature optimization module of the first branch network to obtain a first predicted fine-grained category;

selecting a second sample image from the plurality of sample images, extracting image characteristics of the second sample image based on a characteristic extraction module of a second branch network, optimizing the second sample image by a characteristic optimization module of the second branch network, and inputting the second sample image into a fine-grained classification module of the second branch network for fine-grained classification to obtain a second predicted fine-grained category;

fusing the image characteristics of the first sample image and the image characteristics of the second sample image to obtain fused image characteristics, and predicting whether the coarse grain categories of the first sample image and the second sample image are the same or not based on a classification module to obtain a prediction result;

obtaining a first classification loss of a first branch network according to a first predicted fine-grained category and a fine-grained category label of a first sample image, obtaining a second classification loss of a second branch network according to a second predicted fine-grained category and a fine-grained category label of a second sample image, and obtaining a third classification loss according to a prediction result, a coarse-grained category of the first sample image and the second sample image;

acquiring corresponding total loss according to the first classification loss, the second classification loss and the third classification loss, and adjusting parameters of the first branch network and the second branch network according to the total loss until a preset training stop condition is met to finish training;

and selecting one branch network from the first branch network and the second branch network as a fine-grained classification model.

Loss3＝-[η*y*log(p)+(1-y)*log(1-p)]；

wherein, Loss3 represents the third classification Loss, η represents the correction coefficient, y represents whether the coarse-grained category label of the first sample image is the same as the coarse-grained category label of the second sample image, and p represents the prediction result.

In one embodiment, in selecting one of the branch networks from the first branch network and the second branch network as the fine-grained classification model, the model training component is operable to:

and selecting a branch network with higher classification accuracy from the first branch network and the second branch network as the fine-grained classification model.

In one embodiment, when the image features of the first sample image and the image features of the second sample image are fused to obtain fused image features, the model training component is configured to:

and carrying out channel combination on the image characteristics of the first sample image and the image characteristics of the second sample image, and taking the result of the channel combination as the combined image characteristics.

It should be noted that the image classification device provided in the embodiment of the present application and the image classification method in the foregoing embodiment belong to the same concept, and any method provided in the embodiment of the image classification method may be executed on the image classification device, and a specific implementation process thereof is described in the foregoing embodiment of the image classification method, and is not described herein again.

The present application also provides a computer-readable storage medium on which a computer program is stored, which, when the stored computer program is executed on a computer, causes the computer to execute the image classification method provided by the embodiments of the present application. The storage medium may be a magnetic disk, an optical disk, a Read Only Memory (ROM), a Random Access Memory (RAM), or the like.

The application also provides an electronic device, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor is used for executing the image classification method provided by the application by calling the computer program stored in the memory.

For example, the electronic device may be a mobile terminal such as a tablet computer or a smart phone. Referring to fig. 11, fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

The processor 401 is electrically connected to the memory 402.

The processor 401 is a control center of the electronic device, connects various parts of the entire electronic device using various interfaces and lines, performs various functions of the electronic device and processes data by running or loading a computer program stored in the memory 402 and calling data stored in the memory 402.

The memory 402 may be used to store software programs and modules, and the processor 401 executes various functional applications and data processing by operating the computer programs and modules stored in the memory 402. The memory 402 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, a computer program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to use of the electronic device, and the like. Further, the memory 402 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401 access to the memory 402.

In the embodiment of the present application, the processor 401 in the electronic device loads instructions corresponding to one or more processes of the computer program into the memory 402 according to the following steps, and the processor 401 runs the computer program stored in the memory 402, so as to implement various functions, such as:

determining a target image needing image classification;

calling a pre-trained fine-grained classification model to perform fine-grained classification on a target image, wherein the fine-grained classification model comprises a feature extraction module, a feature optimization module and a fine-grained classification module, the feature extraction module is used for extracting image features of the target image, the feature optimization module is used for performing optimization processing on the image features to obtain optimized image features, and the fine-grained classification module performs fine-grained classification on the optimized image features to obtain fine-grained classes of the target image;

and acquiring the fine-grained category of the target image output by the fine-grained classification model.

It should be noted that the electronic device provided in the embodiment of the present application and the image classification method in the foregoing embodiment belong to the same concept, and any method provided in the embodiment of the image classification method may be executed on the electronic device, and a specific implementation process thereof is described in detail in the embodiment of the feature extraction method, and is not described herein again.

It should be noted that, for the image classification method of the embodiment of the present application, it can be understood by a person skilled in the art that all or part of the process of implementing the image classification method of the embodiment of the present application can be completed by controlling the relevant hardware through a computer program, where the computer program can be stored in a computer-readable storage medium, such as a memory of an electronic device, and executed by at least one processor in the electronic device, and during the execution process, the process of the embodiment of the image classification method can be included. The storage medium may be a magnetic disk, an optical disk, a read-only memory, a random access memory, etc.

For the data screening apparatus in the embodiment of the present application, each functional module may be integrated in one processing chip, or each module may exist alone physically, or two or more modules are integrated in one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium, such as a read-only memory, a magnetic or optical disk, or the like.

Claims

An image classification method, wherein the image classification method comprises:

determining a target image needing image classification;

calling a pre-trained fine-grained classification model to perform fine-grained classification on the target image, wherein the fine-grained classification model comprises a feature extraction module, a feature optimization module and a fine-grained classification module, the feature extraction module is used for extracting image features of the target image, the feature optimization module is used for performing optimization processing on the image features to obtain optimized image features, and the fine-grained classification module performs fine-grained classification on the optimized image features to obtain fine-grained classes of the target image;

and obtaining the fine-grained classification of the target image output by the fine-grained classification model.
The image classification method according to claim 1, wherein the determining a target image required for image classification comprises:

and when reaching an image classification period, determining the newly added image in the image classification period as the target image.
The image classification method according to claim 1, wherein the fine-grained classification model further comprises a dimension reduction module, and the dimension reduction module is configured to perform feature dimension reduction on the optimized image features to obtain the optimized image features after dimension reduction;

and the fine-grained classification module is also used for performing fine-grained classification on the optimized image features after dimension reduction to obtain a fine-grained category of the target image.
The image classification method according to claim 1, wherein after the obtaining of the fine-grained classification of the target image output by the fine-grained classification model, further comprising:

and allocating a storage path to the target image according to the fine-grained category, and storing the target image into the storage path.
The image classification method according to claim 1, wherein the image features include a feature map, and the feature optimization module is configured to perform a transpose process on the feature map to obtain a transposed feature map, perform a matrix multiplication process on the feature map and the transposed feature map, and use a result of the matrix multiplication as the optimized image features.
The image classification method according to claim 1, wherein before determining the target image to be subjected to image classification, the method further comprises:

obtaining a plurality of sample images, and obtaining fine-grained class labels and coarse-grained class labels of the sample images;

constructing a machine learning network, wherein the machine learning network comprises a first branch network, a second branch network and two classification modules with the same structure, the first branch network comprises a feature extraction module, a feature optimization module and a fine-grained classification module, and the two classification modules are connected with the feature extraction module of the first branch network and the feature extraction module of the second branch network;

selecting a first sample image from the plurality of sample images, extracting image features of the first sample image based on a feature extraction module of a first branch network, and inputting the first sample image into a fine-grained classification module of the first branch network for fine-grained classification after the first sample image is optimized by a feature optimization module of the first branch network to obtain a first predicted fine-grained category;

selecting a second sample image from the plurality of sample images, extracting image characteristics of the second sample image based on a characteristic extraction module of a second branch network, optimizing the second sample image by a characteristic optimization module of the second branch network, and inputting the second sample image into a fine-grained classification module of the second branch network for fine-grained classification to obtain a second predicted fine-grained category;

fusing the image features of the first sample image and the image features of the second sample image to obtain fused image features, and predicting whether the coarse grain categories of the first sample image and the second sample image are the same or not according to the fused image features based on the two classification modules to obtain a prediction result;

obtaining a first classification loss of the first branch network according to the first predicted fine-grained category and a fine-grained category label of the first sample image, obtaining a second classification loss of the second branch network according to the second predicted fine-grained category and a fine-grained category label of the second sample image, and obtaining a third classification loss according to the prediction result, the first sample image and a coarse-grained category of the second sample image;

acquiring corresponding total loss according to the first classification loss, the second classification loss and the third classification loss, and adjusting parameters of the first branch network and the second branch network according to the total loss until a preset training stop condition is met to finish training;

selecting one of the branch networks from the first branch network and the second branch network as the fine-grained classification model.
The image classification method according to claim 6, wherein the third classification loss is obtained according to the following formula:

Loss3＝-[η*y*log(p)+(1-y)*log(1-p)]；

wherein, Loss3 represents the third classification Loss, η represents the correction coefficient, y represents whether the coarse-grained category label of the first sample image is the same as the coarse-grained category label of the second sample image, and p represents the prediction result.
The image classification method according to claim 6, wherein the selecting one branch network from the first branch network and the second branch network as the fine-grained classification model comprises:

obtaining the classification accuracy of a first branch network and obtaining the classification accuracy of a second branch network;

and selecting a branch network with higher classification accuracy from the first branch network and the second branch network as the fine-grained classification model.
The image classification method according to claim 6, wherein the fusing the image features of the first sample image and the image features of the second sample image to obtain fused image features comprises:

and carrying out channel combination on the image characteristics of the first sample image and the image characteristics of the second sample image, and taking the result of the channel combination as the fused image characteristics.
An image classification apparatus, comprising:

the image determining component is used for determining a target image which needs to be subjected to image classification;

the model calling component is used for calling a pre-trained fine-grained classification model to perform fine-grained classification on the target image, wherein the fine-grained classification model comprises a feature extraction module, a feature optimization module and a fine-grained classification module, the feature extraction module is used for extracting image features of the target image, the feature optimization module is used for performing optimization processing on the image features to obtain optimized image features, and the fine-grained classification module performs fine-grained classification on the optimized image features to obtain fine-grained classes of the target image;

and the category acquisition component is used for acquiring the fine-grained category of the target image output by the fine-grained classification model.
A storage medium having stored therein a computer program which, when run on a computer, causes the computer to perform:

determining a target image needing image classification;

calling a pre-trained fine-grained classification model to perform fine-grained classification on the target image, wherein the fine-grained classification model comprises a feature extraction module, a feature optimization module and a fine-grained classification module, the feature extraction module is used for extracting image features of the target image, the feature optimization module is used for performing optimization processing on the image features to obtain optimized image features, and the fine-grained classification module performs fine-grained classification on the optimized image features to obtain fine-grained classes of the target image;

and obtaining the fine-grained classification of the target image output by the fine-grained classification model.
An electronic device, wherein the electronic device comprises a processor and a memory, wherein the memory stores a computer program, and the processor is configured to execute, by calling the computer program stored in the memory:

determining a target image needing image classification;

calling a pre-trained fine-grained classification model to perform fine-grained classification on the target image, wherein the fine-grained classification model comprises a feature extraction module, a feature optimization module and a fine-grained classification module, the feature extraction module is used for extracting image features of the target image, the feature optimization module is used for performing optimization processing on the image features to obtain optimized image features, and the fine-grained classification module performs fine-grained classification on the optimized image features to obtain fine-grained classes of the target image;

and obtaining the fine-grained classification of the target image output by the fine-grained classification model.
The electronic device of claim 12, wherein in determining a target image for which image classification is required, the processor is configured to perform:

and when reaching an image classification period, determining the newly added image in the image classification period as the target image.
The electronic device of claim 12, wherein the fine-grained classification model further comprises a dimension reduction module, and the dimension reduction module performs feature dimension reduction on the optimized image features to obtain the optimized image features after dimension reduction;

and the fine-grained classification module is also used for performing fine-grained classification on the optimized image features subjected to dimension reduction to obtain a fine-grained category of the target image.
The electronic device of claim 12, after obtaining a fine-grained classification of the target image output by the fine-grained classification model, the processor is further configured to perform:

and allocating a storage path to the target image according to the fine-grained category, and storing the target image into the storage path.
The electronic device according to claim 12, wherein the image feature comprises a feature map, and the feature optimization module is configured to transpose the feature map to obtain a transposed feature map, perform matrix multiplication on the feature map and the transposed feature map, and use a result of the matrix multiplication as the optimized image feature.
The electronic device of claim 12, wherein prior to determining a target image for which image classification is required, the processor is further configured to perform:

obtaining a plurality of sample images, and obtaining fine-grained class labels and coarse-grained class labels of the sample images;

constructing a machine learning network, wherein the machine learning network comprises a first branch network, a second branch network and two classification modules with the same structure, the first branch network comprises a feature extraction module, a feature optimization module and a fine-grained classification module, and the two classification modules are connected with the feature extraction module of the first branch network and the feature extraction module of the second branch network;

selecting a first sample image from the plurality of sample images, extracting image features of the first sample image based on a feature extraction module of a first branch network, and inputting the first sample image into a fine-grained classification module of the first branch network for fine-grained classification after the first sample image is optimized by a feature optimization module of the first branch network to obtain a first predicted fine-grained category;

selecting a second sample image from the plurality of sample images, extracting image characteristics of the second sample image based on a characteristic extraction module of a second branch network, optimizing the second sample image by a characteristic optimization module of the second branch network, and inputting the second sample image into a fine-grained classification module of the second branch network for fine-grained classification to obtain a second predicted fine-grained category;

fusing the image features of the first sample image and the image features of the second sample image to obtain fused image features, and predicting whether the coarse grain categories of the first sample image and the second sample image are the same or not according to the fused image features based on the two classification modules to obtain a prediction result;

obtaining a first classification loss of the first branch network according to the first predicted fine-grained category and a fine-grained category label of the first sample image, obtaining a second classification loss of the second branch network according to the second predicted fine-grained category and a fine-grained category label of the second sample image, and obtaining a third classification loss according to the prediction result, the first sample image and a coarse-grained category of the second sample image;

acquiring corresponding total loss according to the first classification loss, the second classification loss and the third classification loss, and adjusting parameters of the first branch network and the second branch network according to the total loss until a preset training stop condition is met to finish training;

selecting one of the branch networks from the first branch network and the second branch network as the fine-grained classification model.
The electronic device of claim 17, wherein the third classification loss is obtained according to the following equation:

Loss3＝-[η*y*log(p)+(1-y)*log(1-p)]；

wherein, Loss3 represents the third classification Loss, η represents the correction coefficient, y represents whether the coarse-grained category label of the first sample image is the same as the coarse-grained category label of the second sample image, and p represents the prediction result.
The electronic device of claim 17, wherein, in choosing one branch network from the first branch network and the second branch network as the fine-grained classification model, the processor is configured to perform:

obtaining the classification accuracy of a first branch network and obtaining the classification accuracy of a second branch network;

and selecting a branch network with higher classification accuracy from the first branch network and the second branch network as the fine-grained classification model.
The electronic device of claim 17, wherein in fusing image features of the first sample image and image features of the second sample image to obtain fused image features, the processor is configured to perform:

and carrying out channel combination on the image characteristics of the first sample image and the image characteristics of the second sample image, and taking the result of the channel combination as the fused image characteristics.