WO2021138911A1

WO2021138911A1 - Image classification method and apparatus, storage medium, and electronic device

Info

Publication number: WO2021138911A1
Application number: PCT/CN2020/071502
Authority: WO
Inventors: 高洪涛
Original assignee: 深圳市欢太科技有限公司; Oppo广东移动通信有限公司
Priority date: 2020-01-10
Filing date: 2020-01-10
Publication date: 2021-07-15
Also published as: CN114830186A

Abstract

An image classification method and apparatus, a storage medium, and an electronic device. The method comprises: determining a target image requiring image classification (101); performing fine-grained classification on the target image by calling a pre-trained fine-grained classification model (102); and obtaining a fine-grained category of the target image output by the fine-grained classification model (103).

Description

Image classification method, device, storage medium and electronic equipment

Technical field

The embodiments of the present application relate to the field of image recognition technology, and in particular, to an image classification method, device, storage medium, and electronic equipment.

Background technique

Image classification mainly includes coarse-grained image classification and fine-grained image classification. Fine-grained image classification is also called sub-category image classification. Its purpose is to make more detailed sub-categories of coarse-grained categories, such as distinguishing bird types and vehicles. Styles, breeds of dogs, etc.

Summary of the invention

This application provides an image classification method, device, storage medium, and electronic equipment, which can realize fine-grained classification of images.

In the first aspect, this application provides an image classification method, including:

Determine the target image that needs to be classified;

A pre-trained fine-grained classification model is called to perform fine-grained classification of the target image, wherein the fine-grained classification model includes a feature extraction module, a feature optimization module, and a fine-grained classification module, and the feature extraction module is used to extract the The image feature of the target image, the feature optimization module is used to optimize the image feature to obtain the optimized image feature, and the fine-grained classification module performs fine-grained classification of the optimized image feature to obtain the image feature of the target image Fine-grained categories;

Acquire the fine-grained category of the target image output by the fine-grained classification model.

In the second aspect, this application also provides an image classification device, including:

The image determination component is used to determine the target image that needs to be classified;

The model calling component is used to call a pre-trained fine-grained classification model to perform fine-grained classification of the target image, wherein the fine-grained classification model includes a feature extraction module, a feature optimization module, and a fine-grained classification module, and the feature extraction The module is used to extract image features of the target image, the feature optimization module is used to optimize the image features to obtain optimized image features, and the fine-grained classification module performs fine-grained classification on the optimized image features, Obtaining the fine-grained category of the target image;

The category acquisition component is used to acquire the fine-grained category of the target image output by the fine-grained classification model.

In a third aspect, this application also provides a storage medium on which a computer program is stored, wherein when the computer program is executed on a computer, the computer is caused to execute:

Determine the target image that needs to be classified;

Call a pre-trained fine-grained classification model to perform fine-grained classification of the target image, where the fine-grained classification model includes a feature extraction module, a feature optimization module, and a fine-grained classification module, and the feature extraction module is used to extract the The image feature of the target image, the feature optimization module is used to optimize the image feature to obtain the optimized image feature, and the fine-grained classification module performs fine-grained classification of the optimized image feature to obtain the image feature of the target image Fine-grained categories;

In a fourth aspect, the present application also provides an electronic device, including a memory and a processor, the memory stores a computer program, and the processor invokes the computer program stored in the memory to execute:

Determine the target image that needs to be classified;

Description of the drawings

The following detailed descriptions of the specific implementations of the present application in conjunction with the accompanying drawings will make the technical solutions of the present application and its beneficial effects obvious.

FIG. 1 is a schematic flowchart of an image classification method provided by an embodiment of the present application.

Fig. 2 is an example diagram of triggering image classification in an embodiment of the present application.

Fig. 3 is an example diagram of an image classification interface provided in an embodiment of the present application.

Fig. 4 is an example diagram of a selection sub-interface provided in an embodiment of the present application.

FIG. 5 is a schematic diagram of a structure of a fine-grained classification model provided by an embodiment of the present application.

Fig. 6 is an example diagram of a target image provided by an embodiment of the present application.

FIG. 7 is a schematic diagram of another architecture of a fine-grained classification model provided by an embodiment of the present application.

FIG. 8 is a schematic diagram of the architecture of a machine learning network provided by an embodiment of the present application.

FIG. 9 is a schematic flowchart of another image classification method provided by an embodiment of the present application.

FIG. 10 is a schematic structural diagram of an image classification device provided by an embodiment of the present application.

FIG. 11 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.

Detailed ways

Please refer to the drawings, in which the same component symbols represent the same components, and the principle of the present application is implemented in an appropriate computing environment as an example. The following description is based on the exemplified specific embodiments of the application, which should not be regarded as limiting other specific embodiments of the application that are not described in detail herein.

Artificial Intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technology of computer science, which attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.

Artificial intelligence technology is a comprehensive discipline, covering a wide range of fields, including both hardware-level technology and software-level technology. Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, and mechatronics. Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.

Among them, Machine Learning (ML) is a multi-field interdisciplinary subject, involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other subjects. Specializing in the study of how computers simulate or realize human learning behaviors in order to acquire new knowledge or skills, and reorganize the existing knowledge structure to continuously improve its own performance. Machine learning is the core of artificial intelligence, the fundamental way to make computers intelligent, and its applications cover all areas of artificial intelligence. Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning and other technologies.

The technical solutions provided by the embodiments of the present application involve artificial intelligence machine learning technology, which are specifically described by the following embodiments:

The embodiment of the application provides an image classification method, an image classification device, a storage medium, and electronic equipment, wherein the execution subject of the image classification method may be the image classification device provided in the embodiment of the application, or integrate the image classification device The image classification device can be implemented in hardware or software. Among them, the electronic device may be a device equipped with a processor (including but not limited to a general-purpose processor, a customized processor, etc.) and having processing capabilities, such as a smart phone, a tablet computer, a palmtop computer, a notebook computer, or a desktop computer.

Please refer to FIG. 1, which is a schematic flowchart of an image classification method provided by an embodiment of the present application. The process of the image classification method may include:

In 101, determine the target image that needs to be classified.

In the embodiment of the present application, the electronic device may determine the target image that needs to be classified based on the preset image classification cycle and the preset image selection rules, or when receiving the image classification instruction input by the user, according to the user input The image classification operation determines the target image that needs to be classified, and so on.

It should be noted that the embodiment of the application does not specifically limit the setting of the image classification cycle, image selection rules, and image classification instructions. The settings can be set by the electronic device according to user input, or the manufacturer of the electronic device can perform the defect of the electronic device. Province settings, etc.

For example, suppose that the image classification cycle is pre-configured as a natural week starting from Monday, and the image selection rule is configured as "select captured images for image classification". In this way, the electronic device can automatically trigger image classification on Mondays, and capture images. The obtained image is determined as the target image that needs to be classified.

For another example, please refer to Figure 2. The electronic device provides a "category" control for triggering image classification in an image browsing interface. Among them, the icon rectangles represent different images, and the round boxes in the rectangles represent selections. The "selection" control corresponding to the image. The user can click the selection control corresponding to an image to select the image, and click the selection control corresponding to the image again to cancel the selection of the image. As shown in Figure 2, when the user selects the desired After the classified images are classified, the image classification instruction is input to the electronic device by clicking the classification control, where the image classification instruction carries instruction information indicating the image selected by the user. Correspondingly, the electronic device according to the image classification instruction input by the user The instruction information determines the image selected by the user as the target image that needs to be classified.

For another example, the electronic device may receive an input image classification instruction through an image classification interface including a request input interface, as shown in FIG. 3, the request input interface may be in the form of an input box, and the user may request an input interface in the form of the input box Enter the identification information of the image that needs to be classified in the image, and enter the confirmation information (such as directly pressing the enter key of the keyboard) to input the image classification instruction, the image classification instruction carries the identification information of the image that needs to be classified. Correspondingly, the electronic device can determine the target image that needs to be classified according to the identification information in the received image classification instruction.

For another example, the image classification interface described in Figure 3 also includes an "open" control. On the one hand, when the electronic device detects that the open control is triggered, it will superimpose a selection sub-interface (such as (Shown in Figure 4), the selection sub-interface provides the user with thumbnails of images that can be classified, such as image A, image B, image C, image D, image E, image F and other image thumbnails for users to find And select the thumbnail of the image that needs to be classified; on the other hand, after selecting the thumbnail of the image that needs to be classified, the user can trigger the confirmation control provided in the selection sub-interface to input the image classification instruction to the electronic device. The image classification instruction is associated with the thumbnail of the image selected by the user, and instructs the electronic device to use the image selected by the user as the target image that needs to be classified.

In addition, those of ordinary skill in the art can also set other specific implementation manners of input image classification instructions according to actual needs, and the present invention does not specifically limit this.

In 102, the pre-trained fine-grained classification model is called to perform fine-grained classification of the target image. The fine-grained classification model includes a feature extraction module, a feature optimization module, and a fine-grained classification module. The feature extraction module is used to extract the image of the target image. Features: The feature optimization module is used to optimize image features to obtain optimized image features, and the fine-grained classification module performs fine-grained classification on the optimized image features to obtain the fine-grained category of the target image.

It should be noted that in this application, a machine learning method is used to train a fine-grained classification model in advance, and the fine-grained classification model is configured to perform fine-grained classification of images.

Please refer to Figure 5, the fine-grained classification model provided by this application consists of two parts, namely a feature extraction module for extracting features, a feature optimization module for optimizing features, and a feature optimization module for fine-grained classification based on features. Fine-grained classification module.

Among them, the electronic device first inputs the target image into the feature extraction module, and performs feature extraction on the target image based on the feature extraction module, thereby obtaining the image features of the target image.

Exemplarily, the feature extraction module includes N layers of convolutional layers. When feature extraction is performed on the target image based on the feature extraction module, the electronic device first performs convolution calculation on the target image based on the first layer of convolutional layer to obtain a feature map, Recorded as the 1-layer feature map; then, the electronic device performs convolution calculations on the first-layer feature map based on the second-layer convolutional layer to obtain a new feature map, which is recorded as the second-layer feature map; and so on, until it is based on The Nth layer of convolutional layer is calculated to obtain the Nth layer feature map, and the Nth layer feature map is used as the image feature obtained by feature extraction of the target image.

It should be noted that the value of N in this application, that is, the number of convolutional layers constituting the feature extraction module, is not specifically limited, and can be set by a person of ordinary skill in the art according to actual needs. For example, in the embodiment of this application The value of N is 9.

After the electronic device extracts the image feature of the target image based on the feature extraction module, it further optimizes the image feature based on the feature optimization module according to the preset optimization strategy, and obtains the optimized image feature, which is recorded as the optimized image feature.

After the electronic device optimizes the extracted image features based on the feature optimization module to obtain optimized image features, it further performs fine-grained classification of the optimized image features based on the fine-grained classification module to obtain the fine-grained category of the target image. Among them, the fine-grained classification module can be a fully connected layer.

For example, referring to Figure 6, when the image on the left side of Figure 6 is determined to be the target image, it can be known that its coarse-grained category is "dog". Based on the fine-grained classification module, perform fine-grained classification of the optimized image features corresponding to the target image to obtain Its fine-grained category is "Corgi", which means that the breed of dog in the target image is Corgi; and when the image on the right side of Figure 6 is determined as the target image, it can be seen that its coarse-grained category is also "dog". Based on the fine-grained classification module, fine-grained classification is performed on the optimized image features corresponding to the target image, and the fine-grained category is obtained as "huskies", which means that the breed of the dog in the target image is huskies.

In 103, the fine-grained category of the target image output by the fine-grained classification model is obtained.

In the embodiment of the present application, after the fine-grained classification model completes the fine-grained classification of the target image, the electronic device can obtain the classified fine-grained category from the fine-grained classification module of the fine-grained classification model.

It can be seen from the above that this application determines the target image that needs to be image classified, and calls the pre-trained fine-grained classification model to fine-grain the target image. The fine-grained classification model includes a feature extraction module, a feature optimization module, and a fine-grained classification model. The classification module, the feature extraction module is used to extract the image features of the target image, the feature optimization module is used to optimize the image features to obtain the optimized image features, and the fine-grained classification module performs fine-grained classification of the optimized image features to obtain the fine-grained features of the target image. Granularity category, and then obtain the fine-grained category of the target image output by the fine-grained classification model. Therefore, the present application does not need to manually perform fine-grained classification, and can efficiently implement fine-grained classification of images.

In an embodiment, determining the target image that needs to be classified includes:

When the image classification period is reached, the newly added image in the image classification period is determined as the target image.

In the embodiment of the present application, when the electronic device reaches the image classification period, it triggers the determination of the target image that needs to be classified. Among them, the electronic device can directly use the newly added image in the image classification period as the target image. For example, in an image classification cycle, the electronic device adds 20 images, and the electronic device uses these 20 images as target images that need to be classified.

In one embodiment, "determine the target image that needs to be classified" includes:

(1) Determine the image under the preset storage path as the target image; or,

(2) Determine the image of the preset image format as the target image; or,

(3) Determine the image of the preset image format under the preset storage path as the target image.

Among them, the embodiment of the present application does not specifically limit the setting of the preset storage path and the preset image format, which can be set by the electronic device according to user input, or the manufacturer of the electronic device can set the electronic device by default. It should be noted that the preset storage path can be configured as one or multiple. Correspondingly, the preset image format can be configured as one or multiple.

For example, if the user needs the electronic device to classify the captured images, the preset storage path can be configured as the storage path of the image captured by the electronic device. Illustratively, if the electronic device is based on the Android system, the preset storage path is configured "/Storage/0/DCIM", in this way, the electronic device will determine all images in the file directory "DCIM" corresponding to /storage/0/DCIM as target images that need to be classified.

For another example, if the user needs an electronic device to classify images in a certain image format, the preset image format can be configured as an image format specified by the user. Illustratively, if the user needs an electronic device to classify images in "JPG" format , The preset image format is configured as the "JPG" format, so that the electronic device will determine all local "JPG" format images as target images that need to be classified.

For another example, if the user needs an electronic device to classify images in a certain image format captured by the electronic device, the preset storage path can be configured as the storage path of the image captured by the electronic device, and the preset image format can be configured as the image format specified by the user. Exemplarily, if the electronic device is based on the Android system, the preset storage path is configured as "/storage/0/DCIM". In addition, if the user needs the "JPG" format images captured by the electronic device to be classified, then The preset image format is configured as the "JPG" format, so that the electronic device will determine all the "JPG" format images in the file directory "DCIM" corresponding to /storage/0/DCIM as target images that need to be classified.

In an embodiment, the fine-grained classification model further includes a dimensionality reduction module, which is used to perform feature dimensionality reduction on optimized image features to obtain optimized image features after dimensionality reduction;

The fine-grained classification module is also used to classify and predict the optimized image features after dimensionality reduction to obtain the fine-grained category of the target image.

Please refer to FIG. 7, the fine-grained classification model provided by the present application further includes a dimensionality reduction module, and the dimensionality reduction module may be a pooling layer. As shown in Figure 7, one end of the dimensionality reduction module is connected to the feature optimization module, and the other end is connected to the fine-grained classification module.

In the embodiment of the present application, after the electronic device optimizes the image extracted by the feature extraction module based on the feature optimization module, it does not directly perform fine-grained classification based on the optimized image features obtained by optimization, but inputs the optimized image features into dimensionality reduction Module, so as to perform feature reduction processing on optimized image features based on the dimensionality reduction module to obtain optimized image features after dimensionality reduction. Then, the electronic device inputs the optimized image feature after dimensionality reduction into the fine-grained classification module for fine-grained classification, and accordingly obtains the fine-grained category of the target image.

In an embodiment, after obtaining the fine-grained category of the target image output by the fine-grained classification model, the method further includes:

The storage path is allocated to the target image according to the fine-grained category, and the target image is stored in the storage path.

In the embodiment of the present application, in order to facilitate the user to browse the image, the electronic device also classifies and stores the target image according to the fine-grained category obtained by fine-grained classification of the target image.

The electronic device can allocate a storage path for each fine-grained category, and store the corresponding target image in the allocated storage path. For example, if the target image is classified into nine categories, the electronic device will correspondingly allocate nine different storage paths, which are respectively used to store the target image of the corresponding category.

In an embodiment, after storing the target image in the distribution storage path, the method further includes:

For the target image in each storage path, obtain the browsing behavior data of the user browsing each target image, and obtain the creation time of each target image;

Perform a weighted summation of the browsing behavior data and creation time of each target image to obtain the weighted sum of each target image;

Sort each target image according to the weighted sum value of each target image.

In the embodiment of the present application, after the target images are classified, the target images of each category (that is, the target images under each storage path) are also sorted.

Among them, the browsing behavior data includes relevant data describing the user's browsing behavior. For example, the browsing behavior data includes the number of times the user browses the target image, and the opening and closing moments of the target image each time the user browses, and so on.

In addition to obtaining the browsing behavior data of the user browsing each target image, the electronic device also obtains the creation time of each target image. Wherein, the creation time is the difference between the current moment and the creation moment of the target image.

It should be noted that the above current time does not specifically refer to a certain time, but refers to the time when the electronic device performs the operation of “acquiring the creation time length of each target image”. In addition, the embodiments of the present application do not specifically limit the generation method of the target image. For example, if a target image is generated by an electronic device by shooting, the time when the target image is generated is the shooting time when the target image is captured by the electronic device. ; For another example, if a target image is generated by an electronic device through the Internet, the generation time of the target image is the download time when the electronic device downloads the target image through the Internet, and so on.

In the embodiment of the present application, after acquiring the browsing behavior data and creation duration of each target image, the electronic device performs a weighted summation on the acquired browsing behavior data and creation duration according to a preset weighted sum algorithm to obtain the corresponding target image The weighted sum value of the image.

Among them, the browsing behavior data can reflect the characteristics of the user's browsing behavior, and the creation time is the characteristics of the image itself. The electronic device performs a weighted summation of the acquired browsing behavior data and the creation time for the purpose of combining the characteristics of the target image and The user characteristics outside the image comprehensively evaluate the target image. In this way, the weighted sum value obtained by the weighted sum is the "score" obtained by the comprehensive evaluation of the target image. The level of this score also reflects that the target image may be The probability of the user browsing.

In the embodiment of the present application, after obtaining the weighted sum value of each target image, the electronic device sorts according to the weighted sum value in descending order.

In an embodiment, the weighted summation of the browsing behavior data and creation time of each target image is performed to obtain the weighted sum value of each target image, which includes:

According to the browsing behavior data of each target image, obtain the browsing times of each target image and the browsing duration of each browsing;

Obtain the average browsing time of each target image according to the number of browsing times of each target image and the browsing time of each browsing;

Normalize the browsing times, average browsing time, and creation time of each target image;

The normalized browsing times, average browsing duration, and creation duration of each target image are weighted and summed to obtain the weighted sum of each target image.

In the embodiment of the present application, the electronic device records the browsing behavior data of the user browsing the target image when the target image is browsed by the user. The browsing behavior data includes but not limited to the number of times the user browses the target image and each time the user browses the target image. The opening time and closing time of the target image, and so on.

Thus, when the electronic device performs a weighted summation of the browsing behavior data and creation time of each target image, it can directly extract the browsing times of each target image (that is, the number of times the user browses the target image) from the browsing behavior data of each target image. ), and according to the "opening time and closing time of the user browsing the target image each time" in the browsing behavior data of each target image, the browsing time of each target image is obtained.

After acquiring the number of times of browsing each target image and the length of each browsing time, the electronic device further calculates the average browsing time of each target image according to the number of times each target image is viewed and the length of each browsing time. It should be noted that those of ordinary skill in the art can understand that the average browsing duration referred to here is the average browsing duration of a single target image, rather than the average browsing duration of multiple target images.

In addition, in the embodiment of the present application, the three types of data, the number of views, the average browsing duration, and the creation duration, are respectively pre-assigned with corresponding weight values, but the respective weight values for the number of views, average browsing duration, and creation duration are assigned values. It is not specifically limited, and can be set by a person of ordinary skill in the art according to actual needs. For example, you can set the weight value corresponding to the number of browsing times to 0.3, the weight value corresponding to the average browsing duration to 0.2, and the weight value corresponding to the creation duration to 0.5.

In order to improve the efficiency of weighted summation, when the electronic device performs weighted summation of the number of views, average browsing time, and creation time of each target image, it first normalizes the number of views, average browsing time, and creation time of each target image Processing, normalizing the number of times of browsing, average browsing time, and creation time of each target image into the same numerical interval.

Then, the electronic device performs a weighted summation on the normalized browsing times, average browsing duration, and creation duration of each target image according to a preset weighted sum algorithm to obtain a weighted sum value corresponding to each target image.

In an embodiment, the image feature includes a feature map, and the feature optimization module is used to transpose the feature map to obtain the transposed feature map, and perform matrix multiplication processing on the feature map and the transposed feature map to multiply the matrixes. The result is used as an optimized image feature.

Taking image features as a feature map as an example, this application provides a way to optimize image features.

Among them, the electronic device first performs transposition processing on the extracted image features based on the feature optimization module, that is, the feature map of the target image, to obtain a transposed feature map, which is recorded as the transposed feature map. Then, the electronic device further performs matrix multiplication processing on the original feature map and the transposed feature map based on the feature optimization module, and uses the result of the matrix multiplication as an optimized image feature for fine-grained classification.

In an embodiment, before determining the target image that needs to be classified, the method further includes:

(1) Obtain multiple sample images, and obtain fine-grained category labels and coarse-grained category labels of the sample images;

(2) Build a machine learning network. The machine learning network includes a first branch network, a second branch network, and two classification modules with the same structure. The first branch network includes a feature extraction module, a feature optimization module, and a fine-grained classification module. The second classification module Connected with the feature extraction module of the first branch network and the feature extraction module of the second branch network;

(3) Select the first sample image from multiple sample images, extract the image features of the first sample image based on the feature extraction module of the first branch network, and input the first sample image after optimization by the feature optimization module of the first branch network. The fine-grained classification module of the branch network performs fine-grained classification to obtain the first predicted fine-grained category;

(4) Select a second sample image from a plurality of sample images, extract the image features of the second sample image based on the feature extraction module of the second branch network, and input the second branch after optimization by the feature optimization module of the second branch network The fine-grained classification module of the network performs fine-grained classification to obtain the second predicted fine-grained category;

(5) Fuse the image features of the first sample image and the image features of the second sample image to obtain the fused image feature, and predict whether the coarse-grained categories of the first sample image and the second sample image are the same based on the two classification module, and get the prediction result;

(6) Obtain the first classification loss of the first branch network according to the first predicted fine-grained category and the fine-grained category label of the first sample image, and obtain the first classification loss of the first branch network according to the second predicted fine-grained category and the fine-grained category label of the second sample image The second classification loss of the second branch network, and the third classification loss is obtained according to the prediction result, the coarse-grained categories of the first sample image and the second sample image;

(7) Obtain the corresponding total loss according to the first classification loss, the second classification loss, and the third classification loss, and adjust the parameters of the first branch network and the second branch network according to the total loss, until the preset training stop condition is met training;

(8) Select a branch network from the first branch network and the second branch network as the fine-grained classification model.

This application also provides an optional way of training a fine-grained classification model.

Among them, the electronic device first obtains a plurality of sample images, and obtains the fine-grained category label and the coarse-grained category label of each sample image. The fine-grained category label is used to describe the fine-grained category of the sample image, and the coarse-grained label is used to describe the coarse-grained category of the sample image.

It should be noted that this application does not make specific restrictions on the manner and quantity of obtaining sample images, and can be configured by a person of ordinary skill in the art according to actual needs. For example, the electronic device can crawl an image from the Internet as a sample image, and receive artificially annotated fine-grained category labels and coarse-grained category labels of the sample image.

In addition, the electronic equipment also builds a machine learning network. Please refer to Figure 8. The built machine learning network includes two branch networks with the same structure and two classification modules. Each branch network includes a feature extraction module, a feature optimization module, and a fine-grained classification. Modules, for easy distinction, one of the branch networks is recorded as the first branch network, and the other branch network is recorded as the second branch network. In addition, the two-classification module is connected to the feature extraction module of the first branch network and the feature extraction module of the second branch network.

After completing the construction of the machine learning network, the electronic device uses the acquired sample images to train the machine learning network.

Wherein, the electronic device selects two sample images from the acquired multiple sample images, and records one of the sample images as the first sample image, and records the other sample image as the second sample image. For the first sample image, the electronic device extracts the image features of the first sample image based on the feature extraction module of the first branch network, is optimized by the feature optimization module of the first branch network, and then is input into the fine-grained classification module of the first branch network for processing. Fine-grained classification, the first predicted fine-grained category is obtained. For the second sample image, the electronic device extracts the image features of the second sample image based on the feature extraction module of the second branch network, is optimized by the feature optimization module of the second branch network, and then is input into the fine-grained classification module of the second branch network for fine-grained Classification to obtain the second predicted fine-grained category.

In addition, the electronic device also fuses the image features of the first sample image and the image features of the second sample image to obtain the fused image feature, and predicts the coarse-grained first sample image and the second sample image based on the fused image feature based on the binary classification module Whether the categories are the same, get the prediction result.

In addition, the electronic device also obtains the first classification loss of the first branch network according to the first predicted fine-grained category and the fine-grained category label of the first sample image, and according to the second predicted fine-grained category and the fine-grained category of the second sample image The label obtains the second classification loss of the second branch network, and obtains the third classification loss according to the prediction result, the coarse-grained categories of the first sample image and the second sample image. Among them, a person of ordinary skill in the art can configure the loss functions used to obtain the aforementioned first classification loss, second classification loss, and third classification loss according to actual needs, and this application does not specifically limit this.

Then, the electronic device obtains the corresponding total loss according to the first classification loss, the second classification loss, and the third classification loss, which can be expressed as:

L _total =Loss1+Loss2+Loss3;

Among them, L _total represents the total loss, Loss1 represents the first classification loss, Loss2 represents the second classification loss, and Loss3 represents the third classification loss.

After obtaining the total loss, the electronic device adjusts the parameters of the first branch network and the second branch network according to the total loss. It should be noted that the goal of model training is to minimize the total loss. Therefore, after the total loss is determined each time, the total loss can be minimized as a direction to adjust the parameters of the first branch network and the second branch network.

As above, by continuously adjusting the parameters of the first branch network and the second branch network, the training ends when the preset training stop condition is met. Among them, the preset training stop condition can be set by a person of ordinary skill in the art according to actual needs, which is not specifically limited in the embodiment of the present application.

For example, the preset training stop condition is configured to stop training when the total loss takes the minimum value;

For another example, the preset training stop condition is configured to stop training when the number of iterations of the parameter reaches the preset number.

When the preset training stop condition is met, the electronic device determines that both the first branch network and the second branch in the machine learning network can accurately classify the image at a fine-grained level. At this time, select from the first branch network and the second branch network A branch network is used as a fine-grained classification model for fine-grained classification of images.

It should be noted that this application does not make specific restrictions on how to select fine-grained classification models from the first branch network and the second branch network. For example, in the embodiment of this application, the electronic device can randomly select from the first branch network and the second branch network. A branch network is selected as a fine-grained classification model in the branch network.

In an embodiment, the third classification loss is obtained according to the following formula:

Loss3=-[η*y*log(p)+(1-y)*log(1-p)];

Among them, Loss3 represents the third classification loss; η represents the correction coefficient (taking an empirical value, for example, the value is between 0.3-0.5 in this application); y is used to characterize the coarse-grained category label of the first sample image and the second sample image Whether the coarse-grained category labels are the same can be determined by those of ordinary skill in the art according to actual needs; p represents the prediction result.

In the embodiment of the present application, the purpose of increasing the correction coefficient η is to reduce the contribution of y*log(p) to the total loss when the coarse-grained category labels of the first sample image and the second sample image are different, so as to reduce the contribution of y*log(p) to the total loss. The first branch network and the second branch network understand that the first sample image and the second sample image are similar in a certain dimension.

In an embodiment, selecting a branch network from the first branch network and the second branch network as the fine-grained classification model includes:

Obtain the classification accuracy rate of the first branch network, and obtain the classification accuracy rate of the second branch network;

The branch network with higher classification accuracy is selected from the first branch network and the second branch network as the fine-grained classification model.

This application provides a method for selecting a fine-grained classification model, in which the electronic device obtains the classification accuracy of the first branch network and the classification accuracy of the second branch network respectively, and then obtains the classification accuracy of the first branch network and the second branch network. The branch network with higher classification accuracy is selected as the fine-grained classification model.

In an embodiment, fusing the image features of the first sample image and the image features of the second sample image to obtain the fused image features includes:

Channel merging is performed on the image features of the first sample image and the image features of the second sample image, and the result of the channel merging is used as the fused image feature.

For example, the electronic device can combine the image features of the first sample image and the image features of the second sample image in a Concat manner, and use the result of the channel combination as the fused image feature.

Please refer to Figure 9. This application also provides a model training method, and the process of the model training method may be:

In 201, a plurality of sample images are acquired, and the fine-grained category labels and coarse-grained category labels of the sample images are acquired.

The electronic device first obtains a plurality of sample images, and obtains the fine-grained category label and the coarse-grained category label of each sample image. The fine-grained category label is used to describe the fine-grained category of the sample image, and the coarse-grained label is used to describe the coarse-grained category of the sample image.

It should be noted that this application does not make specific restrictions on the manner and quantity of obtaining sample images, and can be configured by a person of ordinary skill in the art according to actual needs. For example, referring to Figure 10, a part of the sample images can be obtained in advance from the ImageNet data set and stored in the electronic device, part of the sample images can be crawled from the network and stored in the electronic device, and a part of the sample images can be manually collected from the network.

After acquiring the sample image, the electronic device further receives the fine-grained category label and the coarse-grained category label of the manually annotated sample image. Among them, the fine-grained category tag is used to describe the fine-grained category of the sample image, and the coarse-grained tag is used to describe the coarse-grained category of the sample image.

In 202, a machine learning network is constructed. The machine learning network includes a first branch network, a second branch network, and two classification modules with the same structure. The first branch network includes a feature extraction module, a feature optimization module, and a fine-grained classification module. The module is connected with the feature extraction module of the first branch network and the feature extraction module of the second branch network.

Please refer to Figure 8. The constructed machine learning network includes two branch networks with the same structure and two classification modules. Each branch network includes a feature extraction module, a feature optimization module, and a fine-grained classification module. The network is recorded as the first branch network, and the other branch network is recorded as the second branch network. In addition, the two-classification module is connected to the feature extraction module of the first branch network and the feature extraction module of the second branch network.

In 203, the first sample image is selected from a plurality of sample images, and the image features of the first sample image are extracted based on the feature extraction module of the first branch network, and the image features of the first sample image are optimized by the feature optimization module of the first branch network and input the first sample image. The fine-grained classification module of a branch network performs fine-grained classification to obtain the first predicted fine-grained category.

In 204, a second sample image is selected from a plurality of sample images, and the image features of the second sample image are extracted based on the feature extraction module of the second branch network, and input the second sample image after optimization by the feature optimization module of the second branch network. The fine-grained classification module of the branch network performs fine-grained classification to obtain the second predicted fine-grained category.

In 205, the image features of the first sample image and the image features of the second sample image are fused to obtain the fused image feature, and the coarse-grained categories of the first sample image and the second sample image are predicted based on the fusion image feature based on the binary classification module Whether they are the same, get the predicted result.

In 206, obtain the first classification loss of the first branch network according to the first predicted fine-grained category and the fine-grained category label of the first sample image, and obtain the first classification loss of the first branch network according to the second predicted fine-grained category and the fine-grained category label of the second sample image Obtain the second classification loss of the second branch network, and obtain the third classification loss according to the prediction result, the coarse-grained categories of the first sample image and the second sample image.

Among them, a person of ordinary skill in the art can configure the loss functions used to obtain the aforementioned first classification loss, second classification loss, and third classification loss according to actual needs, and this application does not specifically limit this.

For example, the third classification loss is obtained according to the following formula:

Loss3=-[η*y*log(p)+(1-y)*log(1-p)];

In 207, the corresponding total loss is obtained according to the first classification loss, the second classification loss, and the third classification loss, and the parameters of the first branch network and the second branch network are adjusted according to the total loss until the preset training stop condition is met End training.

For example, obtain the total loss according to the following formula:

L _total =Loss1+Loss2+Loss3;

In 208, a branch network is selected from the first branch network and the second branch network as a fine-grained classification model.

Please refer to FIG. 10, which is a schematic structural diagram of the image classification device provided by the present application. The image classification device may include: an image determining component 301, a model calling component 302, and a category obtaining component 303.

The image determining component 301 is used to determine the target image that needs to be image classified;

The model calling component 302 is used to call the pre-trained fine-grained classification model to perform fine-grained classification of the target image. The fine-grained classification model includes a feature extraction module, a feature optimization module, and a fine-grained classification module. The feature extraction module is used to extract the target The image feature of the image, the feature optimization module is used to optimize the image feature to obtain the optimized image feature, and the fine-grained classification module performs fine-grained classification on the optimized image feature to obtain the fine-grained category of the target image;

The category obtaining component 303 is used to obtain the fine-grained category of the target image output by the fine-grained classification model.

In one embodiment, when determining the target image that needs to be classified, the image determining component 301 is used to:

In an embodiment, the image classification device provided in the present application further includes a classification storage component, which is used to allocate a storage path for the target image according to the fine-grained category after obtaining the fine-grained category of the target image output by the fine-grained classification model, and The target image is stored in the storage path.

In an embodiment, the image classification device provided in the present application further includes a model training component, which is used to: before determining the target image that needs to be image classified:

Obtain multiple sample images, and obtain fine-grained category labels and coarse-grained category labels of the sample images;

Construct a machine learning network. The machine learning network includes a first branch network, a second branch network, and two classification modules with the same structure. The first branch network includes a feature extraction module, a feature optimization module, and a fine-grained classification module. The second classification module is the same as the first branch network. The feature extraction module of the branch network is connected to the feature extraction module of the second branch network;

Select the first sample image from a plurality of sample images, and extract the image features of the first sample image based on the feature extraction module of the first branch network, and input the image features of the first branch network after optimization by the feature optimization module of the first branch network The fine-grained classification module performs fine-grained classification to obtain the first predicted fine-grained category;

Select a second sample image from a plurality of sample images, and extract the image features of the second sample image based on the feature extraction module of the second branch network, and input the details of the second branch network after optimization by the feature optimization module of the second branch network The granularity classification module performs fine-grained classification to obtain the second predicted fine-grained category;

Fusing the image features of the first sample image and the image features of the second sample image to obtain the fused image feature, and predicting whether the coarse-grained categories of the first sample image and the second sample image are the same based on the two classification module, to obtain the prediction result;

Obtain the first classification loss of the first branch network according to the first predicted fine-grained category and the fine-grained category label of the first sample image, and obtain the second branch according to the second predicted fine-grained category and the fine-grained category label of the second sample image The second classification loss of the network, and the third classification loss is obtained according to the prediction result, the coarse-grained categories of the first sample image and the second sample image;

Obtain the corresponding total loss according to the first classification loss, the second classification loss, and the third classification loss, and adjust the parameters of the first branch network and the second branch network according to the total loss, and end the training when the preset training stop condition is met;

A branch network is selected from the first branch network and the second branch network as the fine-grained classification model.

Loss3=-[η*y*log(p)+(1-y)*log(1-p)];

Among them, Loss3 represents the third classification loss, η represents the correction coefficient, y is used to characterize whether the coarse-grained category label of the first sample image and the coarse-grained category label of the second sample image are the same, and p represents the prediction result.

In an embodiment, when selecting a branch network from the first branch network and the second branch network as the fine-grained classification model, the model training component is used to:

A branch network with a higher classification accuracy rate is selected from the first branch network and the second branch network as the fine-grained classification model.

In an embodiment, when the image features of the first sample image and the image features of the second sample image are fused to obtain the fused image features, the model training component is used to:

Perform channel merging on the image features of the first sample image and the image features of the second sample image, and use the result of channel merging as the fused image feature.

It should be noted that the image classification device provided in this embodiment of the application belongs to the same concept as the image classification method in the above embodiment. Any method provided in the image classification method embodiment can be run on the image classification device, and its specific implementation For details of the process, please refer to the above embodiment of the image classification method, which will not be repeated here.

The present application also provides a computer-readable storage medium on which a computer program is stored. When the stored computer program is executed on a computer, the computer is caused to execute the image classification method provided in the embodiment of the present application. Among them, the storage medium may be a magnetic disk, an optical disk, a read only memory (Read Only Memory, ROM,), or a random access device (Random Access Memory, RAM), etc.

The present application also provides an electronic device including a memory and a processor, and a computer program is stored in the memory. The processor is used to execute the image classification method as provided in the present application by calling the computer program stored in the memory.

For example, the above-mentioned electronic device may be a mobile terminal such as a tablet computer or a smart phone. Please refer to FIG. 11, which is a schematic structural diagram of an electronic device provided by an embodiment of the application.

The processor 401 is electrically connected to the memory 402.

The processor 401 is the control center of the electronic device. It uses various interfaces and lines to connect the various parts of the entire electronic device. It executes the electronic device by running or loading the computer program stored in the memory 402 and calling the data stored in the memory 402. Various functions and process data.

The memory 402 may be used to store software programs and modules. The processor 401 executes various functional applications and data processing by running the computer programs and modules stored in the memory 402. The memory 402 may mainly include a program storage area and a data storage area. The program storage area may store an operating system, a computer program required by at least one function (such as a sound playback function, an image playback function, etc.), etc.; Data created by the use of electronic equipment, etc. In addition, the memory 402 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices. Correspondingly, the memory 402 may also include a memory controller to provide the processor 401 with access to the memory 402.

In the embodiment of the present application, the processor 401 in the electronic device will load the instructions corresponding to the process of one or more computer programs into the memory 402 according to the following steps, and run the instructions by the processor 401 and store them in the memory 402 Computer programs to achieve various functions, such as:

Determine the target image that needs to be classified;

Call the pre-trained fine-grained classification model to perform fine-grained classification of the target image. The fine-grained classification model includes a feature extraction module, a feature optimization module, and a fine-grained classification module. The feature extraction module is used to extract the image features of the target image and optimize the features. The module is used to optimize image features to obtain optimized image features, and the fine-grained classification module performs fine-grained classification of optimized image features to obtain the fine-grained category of the target image;

Obtain the fine-grained category of the target image output by the fine-grained classification model.

It should be noted that the electronic device provided in the embodiment of this application belongs to the same concept as the image classification method in the above embodiment. Any method provided in the image classification method embodiment can be run on the electronic device. The specific implementation process is detailed. See the embodiment of the feature extraction method, which will not be repeated here.

It should be noted that for the image classification method of the embodiment of the application, ordinary testers in the field can understand that all or part of the process of implementing the image classification method of the embodiment of the application can be completed by controlling the relevant hardware through a computer program. The computer program may be stored in a computer readable storage medium, such as stored in the memory of an electronic device, and executed by at least one processor in the electronic device. The execution process may include methods such as image classification methods. The flow of the embodiment. Wherein, the storage medium can be a magnetic disk, an optical disk, a read-only memory, a random access memory, etc.

For the data screening device of the embodiment of the present application, each functional module may be integrated in a processing chip, or each module may exist alone physically, or two or more modules may be integrated in one module. The above-mentioned integrated modules can be implemented in the form of hardware or software functional modules. If the integrated module is implemented in the form of a software function module and sold or used as an independent product, it can also be stored in a computer readable storage medium, such as a read-only memory, a magnetic disk or an optical disk, etc. .

Claims

An image classification method, wherein the image classification method includes:

Determine the target image that needs to be classified;

A pre-trained fine-grained classification model is called to perform fine-grained classification of the target image, wherein the fine-grained classification model includes a feature extraction module, a feature optimization module, and a fine-grained classification module, and the feature extraction module is used to extract the The image feature of the target image, the feature optimization module is used to optimize the image feature to obtain the optimized image feature, and the fine-grained classification module performs fine-grained classification of the optimized image feature to obtain the image feature of the target image Fine-grained categories;

Acquire the fine-grained category of the target image output by the fine-grained classification model.
The image classification method according to claim 1, wherein the determining the target image that needs to be classified includes:

When the image classification period is reached, an image newly added in the image classification period is determined as the target image.
The image classification method according to claim 1, wherein the fine-grained classification model further comprises a dimensionality reduction module, and the dimensionality reduction module is used to perform feature dimensionality reduction on the optimized image features to obtain an optimized image after dimensionality reduction feature;

The fine-grained classification module is also used to perform fine-grained classification on the optimized image features after dimensionality reduction to obtain the fine-grained category of the target image.
The image classification method according to claim 1, wherein after said obtaining the fine-grained category of the target image output by the fine-grained classification model, the method further comprises:

A storage path is allocated to the target image according to the fine-grained category, and the target image is stored in the storage path.
The image classification method according to claim 1, wherein the image features include a feature map, and the feature optimization module is used to transpose the feature map to obtain a transposed feature map, and compare the feature map and The transposed feature map is subjected to matrix multiplication processing, and the result of the matrix multiplication is used as the optimized image feature.
The method for image classification according to claim 1, wherein before said determining the target image that needs to be classified, the method further comprises:

Acquiring a plurality of sample images, and acquiring fine-grained category labels and coarse-grained category labels of the sample images;

Construct a machine learning network, the machine learning network includes a first branch network, a second branch network, and a second classification module with the same structure, the first branch network includes a feature extraction module, a feature optimization module, and a fine-grained classification module. The two-classification module is connected to the feature extraction module of the first branch network and the feature extraction module of the second branch network;

A first sample image is selected from the plurality of sample images, and the image features of the first sample image are extracted based on the feature extraction module of the first branch network, and after optimization by the feature optimization module of the first branch network Input the fine-grained classification module of the first branch network to perform fine-grained classification to obtain the first predicted fine-grained category;

Select a second sample image from the plurality of sample images, extract the image features of the second sample image based on the feature extraction module of the second branch network, and input the image features of the second sample image after optimization by the feature optimization module of the second branch network The fine-grained classification module of the second branch network performs fine-grained classification to obtain the second predicted fine-grained category;

The image features of the first sample image and the image features of the second sample image are fused to obtain a fused image feature, and based on the two classification module, the first sample image and the image feature of the fused image are predicted Whether the coarse-grained categories of the second sample image are the same, and the prediction result is obtained;

Obtain the first classification loss of the first branch network according to the first predicted fine-grained category and the fine-grained category label of the first sample image, and obtain the first classification loss of the first branch network according to the second predicted fine-grained category and the second sample Acquiring the second classification loss of the second branch network from the fine-grained category label of the image, and acquiring the third classification loss according to the prediction result, the coarse-grained category of the first sample image and the second sample image;

Obtain the corresponding total loss according to the first classification loss, the second classification loss, and the third classification loss, and adjust the parameters of the first branch network and the second branch network according to the total loss, The training ends when the preset training stop conditions are met;

A branch network is selected from the first branch network and the second branch network as the fine-grained classification model.
The image classification method according to claim 6, wherein the third classification loss is obtained according to the following formula:

Loss3=-[η*y*log(p)+(1-y)*log(1-p)];

Among them, Loss3 represents the third classification loss, η represents the correction coefficient, y is used to characterize whether the coarse-grained category label of the first sample image and the coarse-grained category label of the second sample image are the same, and p represents the prediction result.
7. The image classification method according to claim 6, wherein the selecting a branch network from the first branch network and the second branch network as the fine-grained classification model comprises:

Obtain the classification accuracy rate of the first branch network, and obtain the classification accuracy rate of the second branch network;

A branch network with a higher classification accuracy rate is selected from the first branch network and the second branch network as the fine-grained classification model.
8. The image classification method according to claim 6, wherein said fusing the image features of the first sample image and the image features of the second sample image to obtain the fused image features comprises:

Channel merging is performed on the image feature of the first sample image and the image feature of the second sample image, and the result of the channel merging is used as the fused image feature.
An image classification device, which includes:

The image determination component is used to determine the target image that needs to be classified;

The model calling component is used to call a pre-trained fine-grained classification model to perform fine-grained classification of the target image, wherein the fine-grained classification model includes a feature extraction module, a feature optimization module, and a fine-grained classification module, and the feature extraction The module is used to extract image features of the target image, the feature optimization module is used to optimize the image features to obtain optimized image features, and the fine-grained classification module performs fine-grained classification on the optimized image features, Obtaining the fine-grained category of the target image;

The category acquisition component is used to acquire the fine-grained category of the target image output by the fine-grained classification model.
A storage medium, wherein a computer program is stored in the storage medium, and when the computer program runs on a computer, the computer is caused to execute:

Determine the target image that needs to be classified;

A pre-trained fine-grained classification model is called to perform fine-grained classification of the target image, wherein the fine-grained classification model includes a feature extraction module, a feature optimization module, and a fine-grained classification module, and the feature extraction module is used to extract the The image feature of the target image, the feature optimization module is used to optimize the image feature to obtain the optimized image feature, and the fine-grained classification module performs fine-grained classification of the optimized image feature to obtain the image feature of the target image Fine-grained categories;

Acquire the fine-grained category of the target image output by the fine-grained classification model.
An electronic device, wherein the electronic device includes a processor and a memory, and a computer program is stored in the memory, and the processor is configured to execute:

Determine the target image that needs to be classified;

A pre-trained fine-grained classification model is called to perform fine-grained classification of the target image, wherein the fine-grained classification model includes a feature extraction module, a feature optimization module, and a fine-grained classification module, and the feature extraction module is used to extract the The image feature of the target image, the feature optimization module is used to optimize the image feature to obtain the optimized image feature, and the fine-grained classification module performs fine-grained classification of the optimized image feature to obtain the image feature of the target image Fine-grained categories;

Acquire the fine-grained category of the target image output by the fine-grained classification model.
The electronic device according to claim 12, wherein, when determining a target image that needs to be image classified, the processor is configured to execute:

When the image classification period is reached, an image newly added in the image classification period is determined as the target image.
The electronic device according to claim 12, wherein the fine-grained classification model further comprises a dimensionality reduction module, and the dimensionality reduction module performs feature reduction on the optimized image features to obtain optimized image features after dimensionality reduction;

The fine-grained classification module is also used to perform fine-grained classification on the optimized image features after dimensionality reduction to obtain the fine-grained category of the target image.
The electronic device according to claim 12, after acquiring the fine-grained category of the target image output by the fine-grained classification model, the processor is further configured to execute:

A storage path is allocated to the target image according to the fine-grained category, and the target image is stored in the storage path.
The electronic device according to claim 12, wherein the image feature comprises a feature map, and the feature optimization module is used to transpose the feature map to obtain a transposed feature map, and compare the feature map and The transposed feature map is subjected to matrix multiplication processing, and the result of the matrix multiplication is used as the optimized image feature.
The electronic device according to claim 12, wherein, before determining the target image that needs to be image classified, the processor is further configured to execute:

Acquiring a plurality of sample images, and acquiring fine-grained category labels and coarse-grained category labels of the sample images;

Construct a machine learning network, the machine learning network includes a first branch network, a second branch network, and a second classification module with the same structure, the first branch network includes a feature extraction module, a feature optimization module, and a fine-grained classification module. The two-classification module is connected to the feature extraction module of the first branch network and the feature extraction module of the second branch network;

A first sample image is selected from the plurality of sample images, and the image features of the first sample image are extracted based on the feature extraction module of the first branch network, and after optimization by the feature optimization module of the first branch network Input the fine-grained classification module of the first branch network to perform fine-grained classification to obtain the first predicted fine-grained category;

Select a second sample image from the plurality of sample images, extract the image features of the second sample image based on the feature extraction module of the second branch network, and input the image features of the second sample image after optimization by the feature optimization module of the second branch network The fine-grained classification module of the second branch network performs fine-grained classification to obtain the second predicted fine-grained category;

The image features of the first sample image and the image features of the second sample image are fused to obtain a fused image feature, and based on the two classification module, the first sample image and the image feature of the fused image are predicted Whether the coarse-grained categories of the second sample image are the same, and the prediction result is obtained;

Obtain the first classification loss of the first branch network according to the first predicted fine-grained category and the fine-grained category label of the first sample image, and obtain the first classification loss of the first branch network according to the second predicted fine-grained category and the second sample Acquiring the second classification loss of the second branch network from the fine-grained category label of the image, and acquiring the third classification loss according to the prediction result, the coarse-grained category of the first sample image and the second sample image;

Obtain the corresponding total loss according to the first classification loss, the second classification loss, and the third classification loss, and adjust the parameters of the first branch network and the second branch network according to the total loss, The training ends when the preset training stop conditions are met;

A branch network is selected from the first branch network and the second branch network as the fine-grained classification model.
The electronic device according to claim 17, wherein the third classification loss is obtained according to the following formula:

Loss3=-[η*y*log(p)+(1-y)*log(1-p)];

Among them, Loss3 represents the third classification loss, η represents the correction coefficient, y is used to characterize whether the coarse-grained category label of the first sample image and the coarse-grained category label of the second sample image are the same, and p represents the prediction result.
The electronic device according to claim 17, wherein, when a branch network is selected from the first branch network and the second branch network as the fine-grained classification model, the processor is configured to execute:

Obtain the classification accuracy rate of the first branch network, and obtain the classification accuracy rate of the second branch network;

A branch network with a higher classification accuracy rate is selected from the first branch network and the second branch network as the fine-grained classification model.
The electronic device according to claim 17, wherein, when the image features of the first sample image and the image features of the second sample image are fused to obtain the fused image feature, the processor is configured to execute:

Channel merging is performed on the image feature of the first sample image and the image feature of the second sample image, and the result of the channel merging is used as the fused image feature.