CN110458233B

CN110458233B - Mixed granularity object recognition model training and recognition method, device and storage medium

Info

Publication number: CN110458233B
Application number: CN201910743898.4A
Authority: CN
Inventors: 郭卉; 袁豪磊; 黄飞跃
Original assignee: Tencent Cloud Computing Beijing Co Ltd
Current assignee: Tencent Cloud Computing Beijing Co Ltd
Priority date: 2019-08-13
Filing date: 2019-08-13
Publication date: 2024-02-13
Anticipated expiration: 2039-08-13
Also published as: CN110458233A

Abstract

The application relates to the technical field of Internet and discloses a hybrid granularity object recognition model training and recognition method, a device and a storage medium. The training method of the mixed granularity object recognition model comprises the following steps: acquiring sample images, and determining class labels of the sample images, wherein the class labels comprise fine granularity classes and coarse granularity classes; performing image category identification training on the initial deep learning model based on the sample image and the category label of the sample image to obtain a pre-training model; and adjusting the fine granularity branch classification module of the pre-training model by taking the characteristic difference among the enlarged fine granularity categories as a target to obtain the mixed granularity object recognition model. According to the method and the device, coarse-granularity category identification and fine-granularity category identification can be carried out in the same network structure, and accuracy of fine-granularity category identification is improved.

Description

Mixed granularity object recognition model training and recognition method, device and storage medium

Technical Field

The application relates to the technical field of internet, in particular to a hybrid granularity object recognition model training and recognition method, a device and a storage medium.

Background

In products that implement object recognition, a task often encountered is recognition with both coarse and fine granularity. For example, pets such as cats, dogs, birds and the like are favored nowadays, and people pay attention to the fine classification of a specific animal because animals in different fine classifications have great differences in habit preference and intelligence, such as border shepherd dogs, poodle dogs, halftones and the like under dogs. This requires the user to first know which fine classification the animal belongs to, but is not familiar with the pet's fine classification, so in such pet identification, it is necessary to identify not only the coarse-grained categories (cat, dog, bird, etc.), but also the subdivision categories at each coarse granularity. Currently, in the mixed fine-grained identification task, a fine-grained classification model is generally adopted for identification. The violent identification method without distinguishing coarse granularity from fine granularity is easy to identify the coarse granularity category of a fine granularity object (such as identifying a cat as a certain fine granularity category of a dog), and the fine granularity calculation method for the target characteristic by adopting the fine granularity classification method is easy to cause a poor result of fine granularity identification in certain categories.

Disclosure of Invention

The embodiment of the application provides a training and identifying method and device for a mixed granularity object identification model and a storage medium, which can improve the distinguishing property among coarse granularity categories and achieve a better fine granularity identification effect.

In one aspect, an embodiment of the present application provides a method for training a hybrid granularity object recognition model, where the method includes:

acquiring sample images, and determining class labels of the sample images, wherein the class labels comprise fine granularity classes and coarse granularity classes;

performing image category identification training on the initial deep learning model based on the sample image and the category label of the sample image to obtain a pre-training model;

and adjusting the fine granularity branch classification module of the pre-training model by taking the characteristic difference among the enlarged fine granularity categories as a target to obtain the mixed granularity object recognition model.

Another aspect provides a method of mixed granularity object identification, the method comprising:

acquiring an image to be identified;

inputting the image to be identified into a mixed granularity object identification model to carry out category identification processing, so as to obtain the probability that the image to be identified belongs to each coarse granularity category and the probability that the image to be identified belongs to each fine granularity category under the coarse granularity category;

determining a category recognition result of the image to be recognized based on the probability of the coarse-granularity category and the probability of the fine-granularity category;

the mixed granularity object recognition model is obtained by performing machine learning training on the basis of a sample image and a corresponding class label to obtain a pre-training model, and adjusting a fine granularity branch classification module of the pre-training model with the characteristic difference among the large fine granularity classes as a target.

In another aspect, a training apparatus for a hybrid granularity object recognition model is provided, the apparatus comprising:

the sample image acquisition module is used for acquiring sample images and determining class labels of the sample images, wherein the class labels comprise fine granularity classes and coarse granularity classes;

the model training module is used for carrying out image category identification training on the initial deep learning model based on the sample image and the category label of the sample image to obtain a pre-training model;

and the model adjustment module is used for adjusting the fine granularity branch classification module of the pre-training model by taking the characteristic difference among the enlarged fine granularity categories as a target to obtain the mixed granularity object recognition model.

Wherein the sample image acquisition module comprises:

the image acquisition unit is used for acquiring images;

the fine granularity category labeling unit is used for labeling the fine granularity category to which the image belongs;

the clustering processing unit is used for carrying out clustering processing on the images according to the fine granularity categories of the images and the characteristics of the fine granularity categories to obtain a plurality of image sets, wherein the coarse granularity categories of the images in each image set are the same;

the target coarse granularity category determining unit is used for determining the target coarse granularity category learned by the mixed granularity object identification model according to the distribution of the fine granularity categories to which each image belongs in the image set;

And the sample image determining unit is used for taking all images in the image set corresponding to the target coarse-granularity category as sample images and adding category labels for each sample image, wherein the category labels comprise a fine-granularity category and a coarse-granularity category.

The model training module may be to: inputting the sample image and the class label of the sample image into a convolutional neural network model; forward calculation is carried out on the sample image, so that the prediction probability of the sample image belonging to a coarse-granularity category and the prediction probability of the sample image belonging to a fine-granularity category under the coarse-granularity category are obtained; determining a category prediction result of the sample image based on the prediction probability of the coarse-granularity category and the prediction probability of the fine-granularity category; comparing the category prediction result with the category label, and calculating to obtain a coarse granularity loss value and a fine granularity loss value; calculating a weighted sum of the coarse grain loss value and the fine grain loss value as an overall loss value; the integral loss value is reversely transmitted to a convolutional neural network model, and weight parameters of the convolutional neural network model are adjusted through a random gradient descent method; inputting the sample image and the class label of the sample image into a convolutional neural network model with updated weight parameters, and repeating the step of adjusting the weight parameters until the execution times of the current step of adjusting the weight parameters reach preset times; and taking the convolutional neural network model after the weight parameters are currently adjusted as the pre-training model.

The model adjustment module may be for: forward calculation is carried out on the pre-training model, and fine granularity class characteristics of each sample image under the same coarse granularity class are obtained; the fine-granularity branch classification loss value acquisition unit is used for calculating to obtain fine-granularity branch classification loss values according to fine-granularity category characteristics of each sample image and fine-granularity category characteristics of other sample images under the same coarse-granularity category; determining positive sample images and negative sample images corresponding to the sample images, and calculating to obtain a triplet loss measurement according to the sample images, the positive sample images and the negative sample images; the positive sample image is a sample image belonging to the same fine granularity category as the sample image, and the negative sample image is a sample image belonging to the same coarse granularity category as the sample image and different fine granularity categories; calculating to obtain a total loss value according to the fine-granularity branch classification loss value and the triplet loss measurement; and adjusting parameters of the fine granularity branch classification module according to the total loss value to obtain a mixed granularity object recognition model.

Another aspect provides a mixed-particle size object recognition apparatus, the apparatus comprising:

The image acquisition module to be identified is used for acquiring the image to be identified;

the class identification processing module is used for inputting the image to be identified into a mixed granularity object identification model to carry out class identification processing, so as to obtain the probability that the image to be identified belongs to each coarse granularity class and the probability that the image to be identified belongs to each fine granularity class under the coarse granularity class;

the category identification result determining module is used for determining a category identification result of the image to be identified based on the probability of the coarse-granularity category and the probability of the fine-granularity category;

the mixed granularity object recognition model is obtained by performing machine learning training based on a sample image and a corresponding class label to obtain a pre-training model, and adjusting a fine granularity branch classification module of the pre-training model with the characteristic difference among large fine granularity classes as a target.

The category identification result determination module may be configured to: determining a class of coarse granularity categories with the highest probability among the coarse granularity categories to which the image to be identified belongs as a target coarse granularity category; and sequencing all the fine granularity categories under the target coarse granularity category according to the probability, and selecting the preset number of fine granularity categories sequenced in front as the category identification result of the image to be identified.

In another aspect, an electronic device is provided that includes a processor and a memory having stored therein at least one instruction, at least one program, code set, or instruction set that is loaded and executed by the processor to implement a training method or a mixed-granularity object recognition method of a mixed-granularity object recognition model as described above.

In another aspect, a computer readable storage medium is provided, in which at least one instruction, at least one program, a set of codes, or a set of instructions is stored, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by a processor to implement a method of training a mixed-granularity object recognition model or a method of mixed-granularity object recognition as described above.

The method, the device and the storage medium for training and identifying the hybrid granularity object identification model have the following technical effects:

in the training stage of the mixed granularity object recognition model, the coarse granularity class learning and the fine granularity class learning of the sample image are performed, so that the coarse granularity class recognition and the fine granularity class recognition can be performed in the same network structure; in addition, the fine granularity branch classification module of the pre-training model is adjusted, the gap between fine granularity categories is enlarged, fine granularity category characteristics have distinguishing capability among the fine granularity categories to which the fine granularity category characteristics belong, and accuracy of fine granularity category identification in mixed granularity identification is improved.

Drawings

In order to more clearly illustrate the technical solutions and advantages of embodiments of the present application or of the prior art, the following description will briefly introduce the drawings that are required to be used in the embodiments or the prior art descriptions, it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a prior art HyperFace framework;

fig. 2 is a schematic diagram of a prior art CTF framework;

FIG. 3 is a schematic diagram of a prior art LSFG framework;

FIG. 4 is a schematic diagram of an application environment provided by an embodiment of the present application;

FIG. 5 is a framework diagram of a hybrid granularity object recognition model provided by an embodiment of the present application;

fig. 6 is a schematic diagram of an application scenario of object recognition according to an embodiment of the present application;

FIG. 7 is a schematic flow chart of a training method for a hybrid granularity object recognition model according to an embodiment of the present application;

FIG. 8 is a flowchart of a training method for a pre-training model according to an embodiment of the present application;

FIG. 9 is a flow chart of a method for tuning a fine-grained branch classification module of a pre-trained model according to an embodiment of the application;

FIG. 10 is a schematic diagram of a cross-layer connection layer of a hybrid granularity object identification model provided by an embodiment of the present application;

fig. 11 is a schematic view of an application scenario of a hybrid granularity object recognition model provided in an embodiment of the present application;

FIG. 12 is a flow chart of a method for identifying mixed granularity objects provided in an embodiment of the present application;

FIG. 13 is a process flow diagram of a method for mixed granularity object identification provided by an embodiment of the present application;

fig. 14 is an application scenario of the hybrid granularity object recognition method provided in the embodiment of the present application;

FIG. 15 is a schematic structural diagram of a training device for a hybrid granularity object recognition model according to an embodiment of the present application;

FIG. 16 is a schematic structural view of a mixed grain object recognition device according to an embodiment of the present application;

fig. 17 is a hardware block diagram of a server of a hybrid granularity object identification method according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present application based on the embodiments herein.

It should be noted that the terms "first," "second," and the like in the description and claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention will be further described in detail below with reference to the accompanying drawings and examples. It should be understood that the detailed description and specific examples, while indicating the embodiment of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention. First, description will be made of the prior art and related concepts related to the embodiments of the present invention:

Coarse-grained object recognition, namely, class-level recognition, wherein the specific instance of the object is not considered, and only the recognition performed by the class of the object (such as people, dogs, cats, birds and the like) is considered and the class to which the object belongs is given. A typical example is the identification task in a large generic object identification open source dataset imagenet, identifying which of 1000 categories an object is.

Fine-grained object recognition: example level identification, i.e., identification of which subdivision class the target belongs to, e.g., whether the current object is a doll, samoyer, gold hair, bome (breed of dog), etc.

Mix fine grain identification: among the objects to be identified, there are 2 or more class-level example class identifications, such as those for Tibetan mastiff, german shepherd, border shepherd, autumn canine, firewood canine, bos feline, puppet cat, british shorthair cat, scotch zebra cat, magpie, jago, sparrow, grand peacock, pulsatilla, yellow peacock, eyebrow, which are mixed fine-grained identifications spanning 3 large classes-dog, cat, bird.

imaging net: large generic object identification open source data sets.

image pre-training model: training a deep learning network model based on the imagenet, and obtaining the parameter weight of the model, namely the imagenet pre-training model.

RNN: structural recurrent neural networks are a class of network models that are built in a structural recurrent manner.

feature map obtained by convolving the sample image with the filter. The feature map may be convolved with a filter to generate a new feature map.

Triple-loss: a measuring method of an identification model in machine learning is characterized in that for a certain input sample, a characteristic distance (namely a positive sample distance) between the input sample and a certain sample in the same category is firstly obtained, and a characteristic distance (namely a negative sample distance) between the input sample and a certain sample in different categories is obtained, and the difference between the negative sample distance and the positive sample distance is calculated to be used as an error value for the input sample.

In order to facilitate the explanation of the advantages of the method in the embodiment of the present invention, the technical solution of the embodiment of the present invention is first described in detail in the related content of the prior art:

at present, three technologies are relatively close to the scheme: hyperFace (ADkeep Multi-task Learning Framework for Face Detection, landmark Localization, pose Estimation, and Gender Recognition), CTF (Coarse-to-fine: A RNN-based hierarchical attention model for vehicle re-identification) and LSFG (Embedding Label Structures for Fine-Grained Feature Representation). All three are deep learning-based techniques.

As shown in fig. 1, the HyperFace uses a multitask framework to learn features related to recognition for input sample images in a subject network (conv 1 to fc 6), and output features of the subject implement functions of whether to detect, locate key points, see points, estimate pose, recognize gender, and the like through a multi-branch structure (two-layer full-link layer), respectively.

As shown in fig. 2, the CTF principal network generates a principal feature map, and performs a two-branch process on the feature map to identify coarse-grained and fine-grained categories, respectively. The coarse-grained identification branches mainly adopt RNN structure extraction features and finally are classified through full-connected layers; the fine-granularity branch mainly performs simultaneous activation on coarse-granularity RNN features according to the main feature map and the coarse-granularity RNN features to learn the importance of different channel features in the main feature map, and then performs secondary RNN learning on fine-granularity features according to the importance difference and the coarse-granularity RNN features, so that fine-granularity classification can be performed through the full-continuous layer.

As shown in fig. 3, the LSFG method adds a triplet-loss to the classification loss function of the conventional classification depth network so that the features of the same class are similar, and the feature differences of different classes become larger. The positive and negative samples are obtained from coarse-grained categories, such as for vehicle identification, positive samples are samples of the same target category, and negative samples are samples of different target categories in the same vehicle model.

However, the Hyperface method cannot more accurately distinguish between multiple fine-grained categories within the same coarse granularity. The method directly uses the parallel learning of the multitasking targets in the mixed fine granularity recognition in an end-to-end manner, and has the following defects: the multi-task design taking the face as the learning material is related to information such as the multi-task recognition target position or characteristics, and the targets of the mixed granularity recognition can be different in form and key characteristic parts, so that the recognition effect is poor; the requirements of the coarse granularity on the characteristics are different, and the problems that the recognition capability is difficult to improve easily occur when two related tasks of the coarse granularity and the fine granularity are treated equally; the network architecture cannot utilize the existing imagenet pre-training model, so a large amount of data is required to pre-train the model to obtain sufficient recognition capability.

CTF methods cannot more accurately distinguish differences between multiple fine-grained categories within the same coarse granularity. Specifically, the method classifies the coarse and fine granularity identification, and utilizes the coarse granularity characteristics and the main body characteristics to carry out fine granularity identification. In addition, it has two drawbacks: coarse-granularity RNN characteristics are applied to the input of a fine-granularity attention module and the output of the module, so that insufficient learning capacity of the attention module is easily caused, and the attention module capacity cannot be better exerted; the depth of the coarse granularity branch network is equivalent to that of the fine granularity branch network, so that the characteristic learning of a certain granularity is easily insufficient, and the identification effect of a certain task is poor.

The LSFG method cannot effectively distinguish differences in coarse granularity, and if some fine granularity categories belong to different coarse granularities, the recognition effect is poor. The LSFG has the defect that similar problems of fine granularity categories among coarse granularity cannot be solved, and because the model only distinguishes the fine granularity categories among the same coarse granularity and does not learn the distinguishing property among the coarse granularity, the problem of inaccurate identification among the fine granularity exists in the coarse granularity categories after model learning.

From the above, the prior art has the defects that the coarse-grain classification identification of the fine-grain objects is insufficient and the fine-grain identification effect in the coarse-grain is poor in the mixed-grain object identification.

In view of the above, the present invention provides a mixed granularity object recognition scheme, which aims to improve the effect of object category recognition on the existing basic model structure by innovatively designing the learning flow of a recognition model, and achieve the following effects:

1) The coarse grain identification task and the fine grain identification task are performed in the same network structure. The model parameters are updated dynamically by coarse granularity and fine granularity, so that the two task identification effects reach better performance.

2) The learning of coarse granularity categories is realized to ensure the distinguishability among coarse granularity.

3) And distinguishing learning among the fine-grained categories is realized, so that accurate identification among similar fine-grained categories is ensured.

Referring to fig. 4, fig. 4 is a schematic diagram of an application environment provided in an embodiment of the present application, including a server 101 and a terminal device 102, where the server 101 may be a hybrid granularity object identification model, and provides an object class identification service for the downstream terminal device 102.

Specifically, the server 101 may perform clustering processing on the images according to the fine granularity categories of the images to obtain an image set, and determine a sample image serving as a training sample of the initial deep learning model according to the distribution of the fine granularity categories to which each image in the image set belongs. The server 101 may train the initial deep learning model according to the sample image and the class label of the sample image, and adjust the fine granularity branch classification module of the pre-trained model obtained by training to obtain the hybrid granularity object recognition model. The server 101 receives an image to be identified sent by the terminal device 102, identifies the image to be identified by using a mixed granularity object identification model to obtain the probability that the image to be identified belongs to the existing coarse granularity category of the model and the probability that the image belongs to the existing fine granularity category of the model, and performs data processing based on the probability of the coarse granularity category and the probability of the fine granularity category to determine the category identification result of the image to be identified.

In this embodiment of the present application, the execution body may be a server shown in fig. 4, or may be a server platform, where a plurality of servers may be included in the server platform, for example, a first server may perform clustering processing on images, determine a sample image that is a training sample of an initial deep learning model according to a result of the clustering processing, and then send the sample image to a second server.

The second server can train the initial deep learning model according to the sample image and the class label of the sample image, and adjust the fine granularity branch classification module of the pre-trained model obtained through training to obtain the mixed granularity object recognition model.

The third server receives the image to be identified sent by the terminal device 102, and sends the image to be identified to the second server, the second server performs category identification on the image to be identified to obtain the probability that the image to be identified belongs to each coarse-granularity category and the probability that the image to be identified belongs to each fine-granularity category, and sends the probability of the coarse-granularity category and the probability of the fine-granularity category to the third server.

The third server may perform data processing according to the probability of the coarse-granularity category and the probability of the fine-granularity category, determine a category recognition result of the image to be recognized, and send the category recognition result to the terminal device 102.

In the embodiment of the present application, the server 101 and the terminal device 102 may be connected through a wireless link.

In this embodiment of the present application, the first server, the second server and the third server may be connected through a wireless link, or may be connected through a wired link. The choice of the type of communication link may depend on the actual application and the application environment. Alternatively, the first server, the second server, and the third server may be disposed in the same space.

In the embodiment of the present application, the terminal device 102 may be a mobile phone, a tablet computer, a desktop computer, a notebook computer, a wearable device, and so on.

Fig. 5 is a frame diagram of a hybrid-granularity object identification model provided in an embodiment of the present application, where the model frame shown in fig. 5 is designed to implement identification of a hybrid-granularity object class. Referring to fig. 5, the hybrid granularity object recognition model framework includes a training image input module 510, a main feature module 520, a coarse granularity feature module 530, a coarse granularity recognition module 540, a fine granularity feature module 550, and a fine granularity recognition module 560, wherein the coarse granularity feature module 530 and the coarse granularity recognition module 540 form a coarse granularity branch classification module, and the fine granularity feature module 550 and the fine granularity recognition module 560 form a fine granularity branch classification module. The training image input module 510 receives an image to be identified; the main body feature module 520 identifies key object parts in the image to be identified, which is input into the identification model, so as to obtain image features, and the image features are respectively transmitted to the coarse-granularity branch classification module and the fine-granularity branch classification module; the coarse granularity recognition module of the coarse granularity branch classification module compares the image features with the features corresponding to the coarse granularity categories in the coarse granularity feature module, calculates the similarity of the image features and the features corresponding to the coarse granularity categories, and takes the similarity as the probability that the image to be recognized belongs to the coarse granularity categories; and the fine grain identification module of the fine grain branch classification module compares the image features with features corresponding to all fine grain categories in the fine grain feature module, calculates the similarity of the image features and the features corresponding to the fine grain categories, and takes the similarity as the probability that the image to be identified belongs to the fine grain categories. It should be noted that, different network structures may be used as the main feature module, the coarse-granularity feature module, the fine-granularity feature module, the coarse-granularity recognition module and the fine-granularity recognition module, and the coarse-granularity feature module and the fine-granularity feature module may be in a connectionless state, or may be connected with each other by adopting any deep learning connection technology.

The mixed granularity object recognition model framework provided by the embodiment of the application comprises a coarse granularity branch classification module and a fine granularity branch classification module, and can recognize coarse granularity categories and fine granularity categories in the same network structure. The mixed granularity object recognition model framework can be used for performing common recognition tasks, such as pet type recognition, scene recognition and the like. The pet species identification needs to be carried out by mixing a plurality of large species of pet fine-granularity identification, taking large classification as coarse-granularity classification, such as cat, dog, snake, tortoise, bird, fish and the like, and taking fine classification under various large classification as fine-granularity classification; in scene recognition, if for a position scene recognition task, there is a confusing scene, for example, for a lake, there is a possibility of tropical rain forest, park, natural river channel and marsh, namely, the confusing scene can be classified into a coarse granularity category, and a specific scene is used as a fine granularity category, so that the similarity of features in the same coarse granularity can be improved, the difference between fine granularities can be enhanced, and the recognition capability of mixed fine granularity objects can be improved. Fig. 6 is a schematic diagram of an application scenario of object recognition provided in the embodiment of the present application, in which a mixed-granularity object recognition model is used to perform the task of pet type recognition shown in fig. 6, after the model learns characteristics of three genera including cat, dog and bird, and characteristics of the species of hastelloy, autumn dog, firewood dog, bos cat, puppet cat, british short cat, eyebrow drawing bird, and jago, when a hastelloy image is input to the mixed-granularity object recognition model, coarse granularity classification of the object in the image as dog and fine granularity classification of the object in the image as hastelloy can be accurately recognized.

According to the method, the mixed fine-granularity recognition task is improved by deep learning, a mixed fine-granularity object recognition model with better performance is obtained through a multi-task learning framework under the condition that the labeling amount is not increased, coarse-granularity recognition is introduced into the multi-task learning framework, and the mixed fine-granularity recognition task can be simultaneously qualified through sharing part of network parameters. And moreover, the recognition framework carries out distinguishing learning on the problem of confusion among the fine granularity, so that the fine granularity category can be effectively distinguished.

In the following, a specific embodiment of a method for training a hybrid-granularity object recognition model according to the present application is described, and fig. 7 is a schematic flow chart of a method for training a hybrid-granularity object recognition model according to the embodiment of the present application, where the method operation steps as the example or the flowchart are provided in the present specification, but more or fewer operation steps may be included based on conventional or non-inventive labor. The order of steps recited in the embodiments is merely one way of performing the order of steps and does not represent a unique order of execution. When implemented in a real system or server product, the methods illustrated in the embodiments or figures may be performed sequentially or in parallel (e.g., in a parallel processor or multithreaded environment). As shown in fig. 7, the method may include:

S701: and acquiring sample images, and determining class labels of the sample images, wherein the class labels comprise fine granularity classes and coarse granularity classes.

In the embodiment of the application, the sample images are screened from mass images, so that machine learning is facilitated, and the sample images need to meet the following requirements: the sample images belonging to the coarse granularity categories are provided with a plurality of sample images belonging to different fine granularity categories, so that the model can fully contrast and learn the characteristics of different fine granularity categories in the same coarse granularity category, the model has better distinguishing capability on the fine granularity characteristics, and the fine granularity category identification effect is improved.

In an alternative embodiment, the sample image may be obtained by:

s7011, an image is acquired.

Specifically, the image should be of a category to be identified, for example, on the mixed fine granularity identification problem of dogs, cats and birds, the image including the eight-go, autumn dogs, bos cats and the like is collected, and the image can be obtained by means of manual shooting, network searching and the like.

S7013, labeling the fine granularity category to which the image belongs.

And labeling the acquired images with fine granularity categories. Since the subsequent coarse-grained categories can be directly obtained from the fine-grained category labels through the affiliation, the coarse-grained categories do not need to be marked, and marking work is not increased.

S7015, clustering the images according to the fine granularity category of the images and the characteristics of the fine granularity category to obtain a plurality of image sets, wherein the coarse granularity category of each image in each image set is the same.

In an alternative embodiment, according to the fine granularity category of the image and the characteristics of the fine granularity category, clustering rules are adopted to aggregate the fine granularity category of all the images into less coarse granularity categories (namely, large classification categories), so as to obtain a plurality of image sets, wherein each image set corresponds to one coarse granularity category. The features of the fine-grained class may be features inherent to the class, or may be abstract features obtained by means of machine learning or the like. In this embodiment, any clustering rule, any class feature may be used to generate coarse-grained class.

S7017, determining a target coarse-granularity category for learning the mixed-granularity object recognition model according to the distribution of the fine-granularity categories to which each image belongs in the image set.

And determining the target coarse-granularity category to be learned by the recognition model according to the coarse-granularity category and the performance of the fine-granularity category under the coarse-granularity category. Because in coarse-grained categories, factors that are unfavorable for machine learning may occur, such as too few fine-grained categories, which may result in insufficient learning of the differentiation between fine-grained categories in the coarse-grained category, not all coarse-grained categories may be targeted coarse-grained categories, and the task at this stage is to design an efficient coarse-grained category. In an alternative embodiment, if the number of fine-grained categories under the coarse-grained category reaches the preset category number, the coarse-grained category may be regarded as the target coarse-grained category. In another alternative embodiment, if the number of fine-grained categories under the coarse-grained category reaches the preset number of categories, and the number of images corresponding to each fine-grained category under the coarse-grained category reaches the preset number of images, the coarse-grained category may be taken as the target coarse-grained category. The meaning of determining the target coarse-granularity category is that the fine-granularity category features are more concentrated in the coarse-granularity category space to which the fine-granularity category features belong, and the fine-granularity category features can be fully learned during subsequent model training, so that the model has better distinguishing capability on the fine-granularity category.

S7019, taking all images in the image set corresponding to the target coarse-granularity category as sample images, and adding a category label to each sample image, where the category label includes a fine-granularity category and a coarse-granularity category.

S703: and carrying out image category identification training on the initial deep learning model based on the sample image and the category label of the sample image to obtain a pre-training model.

The present embodiment uses the model framework design model of fig. 5 to achieve identification of mixed-granularity object categories. For complex recognition tasks, a multi-layer network structure is required to learn target features more effectively, the feature modules in fig. 5 are all deep learning neural network multi-layer structures, and the multi-layer structures mainly comprise a plurality of stacked deep learning operations such as convolution, normalization, pooling, cross-layer connection and the like, such as a cross-layer connection layer shown in fig. 10 and a ResNet-101 structure table shown in table 1. The identification modules referred to in fig. 5 are a combination of pooled, fully connected layers.

TABLE 1

Table 2 is a layer structure of a coarse-grain identification module of the mixed-grain object identification model provided in the embodiment of the present application, where m_cr is the number of layers of the residual structure; table 3 is a coarse-grained identification module structure table of a mixed-grained object identification model provided in an embodiment of the application, wherein the coarse-grained feature module is assumed to output features of Nfeat_cr1xNfeat_cr2xNchannel_cr, where Nchannel_cr represents feature dimensions and N_cr is the number of coarse-grained categories.

TABLE 2

TABLE 3 Table 3

Table 4 is a layer structure of a fine-grained identification module of the hybrid-grained object identification model provided in the embodiments of the application, where m_fg is the number of layers of the residual structure; table 5 is a fine-grained identification module structure table of a mixed-granularity object identification model provided in an embodiment of the application, wherein it is assumed that the feature output by the fine-grained feature module is Nfeat_fg1xNfeat_fg2xNchannel_fg, where Nchannel_fg represents the feature dimension and N_fg is the number of fine-grained categories.

TABLE 4 Table 4

TABLE 5

In an alternative embodiment, the pre-training model may be obtained by training the method steps shown in fig. 8, and referring to fig. 8, the training method of the pre-training model includes:

s801, inputting the sample image and the category label of the sample image into a convolutional neural network model.

In this embodiment, the initial deep learning model is preferably a convolutional neural network model, so as to effectively learn the target features through a multi-layer network structure.

S803, performing forward calculation on the sample image to obtain the prediction probability of the sample image belonging to the coarse-granularity category and the prediction probability of the sample image belonging to the fine-granularity category under the coarse-granularity category.

S805, determining a category prediction result of the sample image based on the prediction probability of the coarse-granularity category and the prediction probability of the fine-granularity category.

And S807, comparing the category prediction result with the category label, and calculating to obtain a coarse granularity loss value and a fine granularity loss value.

And S809, calculating a weighted sum of the coarse grain loss value and the fine grain loss value as a whole loss value.

And S811, reversely transmitting the integral loss value to a convolutional neural network model, and adjusting the weight parameter of the convolutional neural network model by a random gradient descent method.

S813, inputting the sample image and the class label of the sample image into a convolutional neural network model with updated weight parameters, and repeating the step S803-S811 to adjust the weight parameters until the execution times of the current weight parameter adjustment step reach preset times; and taking the convolutional neural network model after the weight parameters are currently adjusted as the pre-training model.

Illustratively, training the convolutional neural network model with the recognition model learning method to obtain the pre-training model includes:

(1) Initializing model parameters: conv1-Conv5 uses the parameters of ResNet101 pre-trained on the ImageNet dataset as the primary input parameters to the convolutional neural network model, and as shown in Table 1, the newly added layers, such as Conv6_x, are initialized with a Gaussian distribution with variance of 0.01 and mean of 0. In addition, the weights of different pre-trained class identification models may also be used to initialize the convolutional neural network model.

(2) Model training: and (3) solving a convolution template parameter w and a bias parameter b of the convolution neural network model by adopting a gradient descent method based on SGD (Stochastic Gradient Descent), calculating a prediction result error and reversely propagating the prediction result error to the convolution neural network model in each iteration process, calculating a gradient and updating parameters of the convolution neural network model. The specific process is as follows: all parameters of a convolutional neural network model are set to be in a learning state, the neural network carries out forward calculation on an input image during training to obtain a prediction probability, the prediction probability comprises the probability that the image belongs to a coarse granularity category and the probability that the image belongs to a fine granularity category, further according to the category result of a prediction graph of the prediction probability, the predicted category result comprises the coarse granularity category and the fine granularity category, the predicted category result is compared with the true category of the image to calculate a coarse granularity loss value and a fine granularity loss value of the model, the weighted sum of the coarse granularity loss value and the fine granularity loss value is calculated to be used as an overall loss value, the overall loss value is returned to the neural network, and the network weight parameter is updated through a random gradient descent method, so that one-time weight optimization is realized, and a pretraining model with good performance is finally obtained through multiple optimization.

S705: and adjusting the fine granularity branch classification module of the pre-training model by taking the characteristic difference among the enlarged fine granularity categories as a target to obtain the mixed granularity object recognition model.

The pre-training model obtained through the training in S703 has the capability of identifying coarse-granularity categories and fine-granularity categories, and in order to improve accuracy of identifying fine-granularity categories, fine-granularity branch classification modules of the pre-training model are further trimmed. Fig. 9 is a flowchart of a method for adjusting a fine-grained branch classification module of a pre-training model according to an embodiment of the application, and referring to fig. 9, the method includes:

s901: and performing forward calculation on the pre-training model to obtain fine granularity class characteristics of each sample image under the same coarse granularity class.

The forward calculation is performed by adopting a forward propagation algorithm (forward propagation), and the forward propagation process can be expressed by a formula (1) no matter how high the dimension is:

h ^t ＝σ(z ^t )＝σ(Ux ^t *Wh ^(t-1) +b) (1)

wherein the superscript t represents the number of layers, x represents convolution, b represents bias term bias, sigma represents activation function, x represents input of sample, h represents hidden state of model, and W represents node of the layer.

For example, assuming that some nodes i, j, k, … of the previous layer are connected with the node w of the present layer, weighted sum operation can be performed on the nodes i, j, k of the previous layer and corresponding connection weights, a bias term is added to the final result, and finally a nonlinear function (i.e. an activation function) such as functions of ReLu, sigmoid and the like is further used to obtain the final result, namely the output of the node w of the present layer. By the method, the output layer result is obtained through layer-by-layer operation.

S903: and calculating according to the fine granularity category characteristics of each sample image and the fine granularity category characteristics of other sample images under the same coarse granularity category to obtain a fine granularity branch classification loss value.

Specifically, the fine-grained branch classification loss value can be calculated by a loss function of the fine-grained branch shown in the formula (2).

Wherein,representing the predicted output for the sample, y represents the actual label of the sample.

When y=1, the number of the groups,the logarithmic function is monotonically increasing, so L is a monotonically decreasing function of the predicted output value. That is, the larger the predicted output value (approaching 1), the smaller the loss function value L (approaching 0), the larger the loss function value L, which meets the actual need;

when y=0, the number of the groups,l is a monotonically increasing function of the predicted output value. That is, the smaller the predicted output value (approaching 0), the smaller the loss function value L, the larger the predicted output value (approaching 1), and the larger the loss function value L, which also meets the actual needs.

L characterizes the difference of the prediction output from y, whether the true sample label y is 0 or 1. The more the predicted output differs from y due to the nature of the log function itself, the greater the value of L, i.e., the greater the "penalty" for the current model, and the non-linear increase, similar to the level of exponential growth, so the model will tend to have the predicted output more closely to the true sample label y.

S905: determining positive sample images and negative sample images corresponding to the sample images, and calculating to obtain a triplet loss measurement according to the sample images, the positive sample images and the negative sample images; the positive sample image is a sample image belonging to the same fine granularity category as the sample image, and the negative sample image is a sample image belonging to the same coarse granularity category as the sample image and different fine granularity categories.

The specific method for acquiring the positive sample image and the negative sample image corresponding to the sample image comprises the following steps: for each sample image (a), selecting another sample image of the same fine granularity category as a positive sample image (p), and selecting sample images of different fine granularity categories in the same coarse granularity category as negative sample images (n), so as to form (a, p, n) three samples. A triple-loss metric loss function of three samples is calculated, i.e., an objective function of the following equation (3) is optimized, thereby realizing to enlarge the feature difference between fine-grained classes. In the formula (3), fa refers to a feature expression obtained by a sample image a in model forward calculation, fp refers to a feature expression obtained by a positive sample image p in model forward calculation, fn refers to a feature expression obtained by a negative sample image n in model forward calculation, dist function is Euclidean distance function, and margin is preset category spacing.

L _metric ＝max(dist(fa,fp)-dist(fa,fn)+margin,0) (3)

S907: and calculating to obtain a total loss value according to the fine-granularity branch classification loss value and the triplet loss measurement.

Specifically, the total loss value can be calculated by the formula (4). Wherein L is _metric For the triplet loss measure, L _class The penalty is classified for fine-grained branches, a being used to adjust the weight between the two penalties.

L＝aL _metric +L _class (4)

S909: and adjusting parameters of the fine granularity branch classification module according to the total loss value to obtain a mixed granularity object recognition model.

Specifically, S705 only adjusts the fine-granularity branch classification module (i.e., the fine-granularity feature module and the fine-granularity recognition module) of the pre-training model, and parameters of other modules (such as the main feature module, the coarse-granularity feature module and the coarse-granularity recognition module) are not updated in the learning process.

In the training stage of the mixed granularity object recognition model, the embodiment not only learns the coarse granularity category of the sample image but also learns the fine granularity category of the sample image, so that the coarse granularity category recognition and the fine granularity category recognition can be realized in the same network structure; in addition, the fine granularity branch classification module of the pre-training model is adjusted, the gap between fine granularity categories is enlarged, fine granularity category characteristics have distinguishing capability among the fine granularity categories to which the fine granularity category characteristics belong, and accuracy of fine granularity category identification in mixed granularity identification is improved.

The mixed granularity object recognition model obtained by the training can be used for carrying out category recognition on the newly input image. Fig. 11 is a schematic view of an application scenario of a hybrid granularity object recognition model provided in an embodiment of the present application, please refer to fig. 11, in which training data is used to train an initial deep learning model to obtain a hybrid granularity object recognition model, when a new image needs to be subjected to class recognition, the image may be input into the hybrid granularity object recognition model, the hybrid granularity object recognition model performs recognition processing on the image, the probability that the image belongs to a coarse granularity class and a fine granularity class is output, and the output result of the hybrid granularity object recognition model may be further analyzed, where one possible processing is to select, when the first N fine granularity classes to which the image most likely belongs need to be output, a class with the highest probability in the coarse granularity class as the coarse granularity class to which the image belongs, and select, as N largest classes in all fine granularity class probabilities in the coarse granularity class as N fine granularity classes to which the image most likely belongs.

The mixed granularity object recognition model learns the category characteristics of various objects by selecting different training data, so that the mixed granularity object recognition model can be applied to any mixed granularity recognition task, any coarse granularity and fine granularity recognition task, and any recognition task of a coarse granularity category can be separated or clustered from a target category, such as pet mixed granularity recognition, scene mixed granularity recognition, pedestrian mixed granularity recognition, face mixed granularity recognition, natural animal coarse granularity recognition, commodity fine granularity recognition, clothing fine granularity recognition and the like.

The embodiment of the application also provides a method for identifying a mixed granularity object, fig. 12 is a schematic flow chart of the method for identifying a mixed granularity object provided in the embodiment of the application, please refer to fig. 12, and the method includes:

s1201: and acquiring an image to be identified.

S1203: inputting the image to be identified into a mixed granularity object identification model to carry out category identification processing, and obtaining the probability that the image to be identified belongs to each coarse granularity category and the probability that the image to be identified belongs to each fine granularity category under the coarse granularity category.

The mixed granularity object recognition model is obtained by performing machine learning training on the basis of a sample image and a corresponding class label to obtain a pre-training model, and adjusting a fine granularity branch classification module of the pre-training model with the characteristic difference among the large fine granularity classes as a target. The training method of the hybrid granularity object recognition model refers to the above embodiment, and this embodiment is not described herein.

In a possible embodiment, the mixed granularity object recognition model for recognizing the image category should be a model which learns the relevant features of the category to which the image belongs in advance, for example, if the image to be recognized is a halfton, the images of various dogs are selected in advance to train the mixed granularity object recognition model, so that the model learns the features of various dogs, and when the halfton image is input into the model, the coarse granularity category and the fine granularity category of the image can be recognized according to the features of the halfton; if the image to be identified is a toy Manhattan ball, the aggregate granularity object identification model for processing the identification task should learn various toys and features in advance. In practical application, the mixed particle size object recognition model can be trained by adopting sample images of single substance types, and can also be trained by adopting various sample images of cross substance types, so that the range of the mixed particle size object recognition model for recognizing objects is larger, for example, the same mixed particle size object recognition model is trained by adopting images of toys, lakes, pets, daily necessities and the like, so that the mixed particle size object recognition model has the capability of recognizing the object types of the cross substance types at the same time, and the capability of processing complex recognition tasks of the mixed object recognition model is improved.

In this embodiment, the performing, by using the mixed-granularity object recognition model, the category recognition processing on the image to be recognized includes:

(1) Extracting main body characteristics of an image to be identified; the subject features include features of key locations in the image to be identified, such as for dogs, including hair, head, neck, limbs, body, tail, and head features further including head shape, ears, nose, eyes, mouth.

(2) Comparing the main body characteristics of the image to be identified with the characteristics of each coarse-granularity category stored in the mixed-granularity object identification model, calculating the matching degree of the main body characteristics of the image to be identified and the characteristics of each coarse-granularity category stored in the mixed-granularity object identification model, and taking the matching degree as the probability that the image to be identified belongs to the coarse-granularity category.

(3) And comparing the main body characteristics of the image to be identified with the characteristics of each fine granularity category stored in the mixed granularity object identification model, and determining the probability that the image to be identified belongs to the fine granularity category. The probability that the image to be identified belongs to the fine granularity category can be determined by any one of the following three methods.

The method comprises the following steps: comparing the main body characteristics of the image to be identified with the characteristics of each fine granularity category stored in the mixed granularity object identification model, calculating a first matching degree of the main body characteristics of the image to be identified and the characteristics of each fine granularity category stored in the mixed granularity object identification model, and taking the first matching degree as the probability that the image to be identified belongs to the fine granularity category;

The second method is as follows: sorting the probability that the image to be identified belongs to coarse-grained categories from large to small, selecting a preset number of coarse-grained categories with the front sorting as candidate coarse-grained categories, comparing the main body characteristics of the image to be identified with the characteristics of each fine-grained category under the candidate coarse-grained categories, calculating the second matching degree of the main body characteristics of the image to be identified and the characteristics of each fine-grained category under the candidate coarse-grained categories, and taking the second matching degree as the probability that the image to be identified belongs to the fine-grained categories;

and a third method: comparing the main body characteristics of the image to be identified with the characteristics of each fine granularity category stored in the mixed granularity object identification model, and calculating the first matching degree of the main body characteristics of the image to be identified and the characteristics of each fine granularity category stored in the mixed granularity object identification model; sorting the probability that the image to be identified belongs to coarse-grained categories from large to small, selecting a preset number of coarse-grained categories with the front sorting as candidate coarse-grained categories, comparing the main body characteristics of the image to be identified with the characteristics of each fine-grained category under the candidate coarse-grained categories, and calculating the second matching degree of the main body characteristics of the image to be identified and the characteristics of each fine-grained category under the candidate coarse-grained categories; and taking the average value of the second matching degree and the first matching degree as the probability that the image to be identified belongs to the fine granularity category.

S1205: and determining a category identification result of the image to be identified based on the probability of the coarse-granularity category and the probability of the fine-granularity category.

In one possible embodiment, a coarse-grain class with the highest probability among coarse-grain classes to which the image to be identified belongs may be determined as the target coarse-grain class; and sequencing all the fine granularity categories under the target coarse granularity category according to the probability, and selecting the preset number of fine granularity categories sequenced in front as the category identification result of the image to be identified. For example, a mixed-granularity object recognition model is input with an image of a halftime, and the probability of the halftime belonging to a dog class of 99%, the probability of the halftime belonging to a cat class of 40%, the probability of the halftime belonging to a bird class of 1%, the probability of the halftime belonging to a cauchy class of 10%, the probability of the halftime belonging to a fine class of 20%, the probability of the halftime belonging to a gebaby class of 5%, and the probability of the halftime belonging to a fine class of 99%.

The hybrid granularity object identification method of the embodiment can be realized through interaction between the client and the server. Fig. 13 is a process flow chart of a hybrid granularity object recognition method provided in the embodiment of the present application, please refer to fig. 13, a user inputs an image to be recognized through a client, a server performs recognition processing on the input image to obtain a recognition result, and returns the recognition result to the client, in one possible implementation manner, the server may include a front end a, a back end and a front end B, where the front end a receives the image sent by the client, transmits the image to the back end to enable the back end to perform recognition processing on the image by using a hybrid granularity object recognition model, the back end outputs a probability that the image belongs to a coarse granularity category and a fine granularity category to the front end B, and the front end B performs post-processing according to the probability that the image belongs to the coarse granularity category and the fine granularity category to obtain a recognition result corresponding to the image, and feeds back the recognition result to the client. Fig. 14 is an application scenario of the hybrid granularity object recognition method provided in the embodiment of the present application, where the application scenario is shown in fig. 19, a user inputs an image of a cauchy through a client, receives a recognition result fed back by a server after the processing of the server, and displays the recognition result on the client.

The mixed fine-granularity object identification method can simultaneously identify the coarse-granularity class and the fine-granularity class of the object, and improves the accuracy of object class identification.

The embodiment of the application also provides a training device of the mixed granularity object recognition model. Fig. 15 is a schematic structural diagram of a training device for a hybrid granularity object recognition model according to an embodiment of the present application, please refer to fig. 15, wherein the device includes:

a sample image acquisition module 1510, configured to acquire sample images, and determine class labels of each sample image, where the class labels include a fine granularity class and a coarse granularity class;

model training module 1520, configured to perform image class recognition training on the initial deep learning model based on the sample image and the class label of the sample image, to obtain a pre-training model;

the model adjustment module 1530 is configured to adjust the fine-granularity branch classification module of the pre-training model with the feature difference between the enlarged fine-granularity classes as a target, so as to obtain a hybrid-granularity object recognition model.

In an alternative embodiment, the sample image acquisition module 1510 may include:

an image acquisition unit 2011 for acquiring an image;

A fine granularity category labeling unit 2012 configured to label a fine granularity category to which the image belongs;

a clustering processing unit 2013, configured to perform clustering processing on the images according to the fine granularity category of the images and the features of the fine granularity category, so as to obtain a plurality of image sets, where each image in the image sets has the same coarse granularity category;

a target coarse granularity category determining unit 2014, configured to determine a target coarse granularity category learned by the mixed granularity object recognition model according to a distribution of fine granularity categories to which each image belongs in the image set;

a sample image determining unit 2015, configured to take all images in the image set corresponding to the target coarse granularity category as sample images, and add a category label to each sample image, where the category label includes a fine granularity category and a coarse granularity category.

In an alternative embodiment, the model training module 1520 may be configured to: inputting the sample image and the class label of the sample image into a convolutional neural network model; forward calculation is carried out on the sample image, so that the prediction probability of the sample image belonging to a coarse-granularity category and the prediction probability of the sample image belonging to a fine-granularity category under the coarse-granularity category are obtained; determining a category prediction result of the sample image based on the prediction probability of the coarse-granularity category and the prediction probability of the fine-granularity category; comparing the category prediction result with the category label, and calculating to obtain a coarse granularity loss value and a fine granularity loss value; calculating a weighted sum of the coarse grain loss value and the fine grain loss value as an overall loss value; the integral loss value is reversely transmitted to a convolutional neural network model, and weight parameters of the convolutional neural network model are adjusted through a random gradient descent method; inputting the sample image and the class label of the sample image into a convolutional neural network model with updated weight parameters, and repeating the step of adjusting the weight parameters until the execution times of the current step of adjusting the weight parameters reach preset times; and taking the convolutional neural network model after the weight parameters are currently adjusted as the pre-training model.

In an alternative embodiment, the model adjustment module 1530 may be configured to: forward calculation is carried out on the pre-training model, and fine granularity class characteristics of each sample image under the same coarse granularity class are obtained; the fine-granularity branch classification loss value acquisition unit is used for calculating to obtain fine-granularity branch classification loss values according to fine-granularity category characteristics of each sample image and fine-granularity category characteristics of other sample images under the same coarse-granularity category; determining positive sample images and negative sample images corresponding to the sample images, and calculating to obtain a triplet loss measurement according to the sample images, the positive sample images and the negative sample images; the positive sample image is a sample image belonging to the same fine granularity category as the sample image, and the negative sample image is a sample image belonging to the same coarse granularity category as the sample image and different fine granularity categories; calculating to obtain a total loss value according to the fine-granularity branch classification loss value and the triplet loss measurement; and adjusting parameters of the fine granularity branch classification module according to the total loss value to obtain a mixed granularity object recognition model.

The embodiment also provides a mixed grain size object identifying device, fig. 16 is a schematic structural diagram of the mixed grain size object identifying device provided in the embodiment of the present application, please refer to fig. 16, the device includes:

a to-be-identified image acquisition module 1610, configured to acquire an to-be-identified image;

the class identification processing module 1620 is configured to input the image to be identified into a mixed granularity object identification model to perform class identification processing, so as to obtain probability that the image to be identified belongs to each coarse granularity class and probability that the image to be identified belongs to each fine granularity class under the coarse granularity class;

a category recognition result determining module 1630, configured to determine a category recognition result of the image to be recognized based on the probability of the coarse-granularity category and the probability of the fine-granularity category;

In an alternative embodiment, the category identification result determining module 1630 may be configured to: determining a class of coarse granularity categories with the highest probability among the coarse granularity categories to which the image to be identified belongs as a target coarse granularity category; and sequencing all the fine granularity categories under the target coarse granularity category according to the probability, and selecting the preset number of fine granularity categories sequenced in front as the category identification result of the image to be identified.

The mixed fine-granularity object identification device can simultaneously identify the coarse-granularity category and the fine-granularity category of the object, and accuracy of object category identification is improved.

The apparatus and method embodiments in the embodiments of the present application are based on the same application concept.

The method embodiments provided in the embodiments of the present application may be performed in a computer terminal, a server, or a similar computing device. Taking the operation on the server as an example, fig. 17 is a hardware structure block diagram of the server of a hybrid granularity object identification method provided in the embodiment of the present application. As shown in fig. 17, the server 1700 may vary considerably in configuration or performance and may include one or more central processing units (Central Processing Units, CPU) 1710 (the processor 1710 may include, but is not limited to, a microprocessor MCU or a processing device such as a programmable logic device FPGA), memory 1730 for storing data, one or more storage mediums 1720 (e.g., one or more mass storage devices) for storing applications 1723 or data 1722. Wherein memory 1730 and storage medium 1720 may be transitory or persistent storage. The program stored on storage medium 1720 may include one or more modules, each of which may include a series of instruction operations on a server. Still further, the central processor 1710 may be configured to communicate with a storage medium 1720, executing a series of instruction operations in the storage medium 1720 on the server 1700. The server 1700 may also include one or more power supplies 1760, one or more wired or wireless network interfaces 1750, one or more input/output interfaces 1740, and/or one or more operating systems 1721, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, etc.

Input-output interface 1740 may be used to receive or transmit data via a network. The specific example of the network described above may include a wireless network provided by a communication provider of the server 1700. In one example, the input output interface 1740 includes a network adapter (Network Interface Controller, NIC) that may connect to other network devices through a base station to communicate with the internet. In one example, the input output interface 1740 may be a Radio Frequency (RF) module for communicating with the internet wirelessly.

It will be appreciated by those of ordinary skill in the art that the configuration shown in fig. 17 is merely illustrative and is not intended to limit the configuration of the electronic device described above. For example, server 1700 may also include more or fewer components than shown in fig. 17, or have a different configuration than shown in fig. 17.

Embodiments of the present application also provide a storage medium that may be disposed in a server to store at least one instruction, at least one program, a set of codes, or a set of instructions related to implementing a hybrid granularity object identification method in a method embodiment, where the at least one instruction, the at least one program, the set of codes, or the set of instructions are loaded and executed by the processor to implement the hybrid granularity object identification method described above.

Alternatively, in this embodiment, the storage medium may be located in at least one network server among a plurality of network servers of the computer network. Alternatively, in the present embodiment, the storage medium may include, but is not limited to: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.

It should be noted that: the foregoing sequence of the embodiments of the present application is only for describing, and does not represent the advantages and disadvantages of the embodiments. And the foregoing description has been directed to specific embodiments of this specification. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the apparatus embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments in part.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The foregoing description of the preferred embodiments of the present application is not intended to limit the invention to the particular embodiments of the present application, but to limit the scope of the invention to the particular embodiments of the present application.

Claims

1. A method for training a hybrid granularity object recognition model, the method comprising:

adjusting the fine granularity branch classification module of the pre-training model by taking the characteristic difference among the enlarged fine granularity categories as a target to obtain a mixed granularity object recognition model;

The fine granularity branch classification module of the pre-training model is adjusted by taking the characteristic difference among the enlarged fine granularity categories as a target to obtain a mixed granularity object recognition model, which comprises the following steps:

forward calculation is carried out on the pre-training model, and fine granularity class characteristics of each sample image under the same coarse granularity class are obtained;

calculating to obtain a fine granularity branch classification loss value according to the fine granularity class characteristics of each sample image and the fine granularity class characteristics of other sample images under the same coarse granularity class;

determining positive sample images and negative sample images corresponding to the sample images, and calculating to obtain a triplet loss measurement according to the sample images, the positive sample images and the negative sample images; the positive sample image is a sample image belonging to the same fine granularity category as the sample image, and the negative sample image is a sample image belonging to the same coarse granularity category as the sample image and different fine granularity categories;

calculating to obtain a total loss value according to the fine-granularity branch classification loss value and the triplet loss measurement;

and adjusting parameters of the fine granularity branch classification module according to the total loss value to obtain a mixed granularity object recognition model.

2. The method of claim 1, wherein the acquiring the sample images and determining class labels for each sample image comprises:

collecting an image;

labeling the fine granularity category to which the image belongs;

clustering the images according to the fine granularity category of the images and the characteristics of the fine granularity category to obtain a plurality of image sets, wherein the coarse granularity category of each image in each image set is the same;

determining a target coarse-granularity category learned by a mixed-granularity object identification model according to the distribution of fine-granularity categories to which each image belongs in the image set;

and taking all images in the image set corresponding to the target coarse-granularity category as sample images, and adding a category label for each sample image, wherein the category label comprises a fine-granularity category and a coarse-granularity category.

3. The method of claim 1, wherein the training the initial deep learning model for image class identification based on the sample image and the class label of the sample image to obtain a pre-training model comprises:

inputting the sample image and the class label of the sample image into a convolutional neural network model;

Forward calculation is carried out on the sample image, so that the prediction probability of the sample image belonging to a coarse-granularity category and the prediction probability of the sample image belonging to a fine-granularity category under the coarse-granularity category are obtained;

determining a category prediction result of the sample image based on the prediction probability of the coarse-granularity category and the prediction probability of the fine-granularity category;

comparing the category prediction result with the category label, and calculating to obtain a coarse granularity loss value and a fine granularity loss value;

calculating a weighted sum of the coarse grain loss value and the fine grain loss value as an overall loss value;

the integral loss value is reversely transmitted to a convolutional neural network model, and weight parameters of the convolutional neural network model are adjusted through a random gradient descent method;

inputting the sample image and the class label of the sample image into a convolutional neural network model with updated weight parameters, and repeating the step of adjusting the weight parameters until the execution times of the current step of adjusting the weight parameters reach preset times;

and taking the convolutional neural network model after the weight parameters are currently adjusted as the pre-training model.

4. A method of mixed-granularity object identification, the method comprising:

Acquiring an image to be identified;

the mixed granularity object recognition model is obtained by performing machine learning training on the basis of a sample image and a corresponding class label to obtain a pre-training model, and adjusting a fine granularity branch classification module of the pre-training model by taking characteristic differences among large fine granularity classes as targets;

the step of adjusting the fine-grained branch classification module with the characteristic difference among the pulled fine-grained categories as a target comprises the following steps:

and adjusting parameters of the fine-granularity branch classification module according to the total loss value.

5. The method of claim 4, wherein the determining the category recognition result of the image to be recognized based on the probability of the coarse-grained category and the probability of the fine-grained category comprises:

determining a class of coarse granularity categories with the highest probability among the coarse granularity categories to which the image to be identified belongs as a target coarse granularity category;

and sequencing all the fine granularity categories under the target coarse granularity category according to the probability, and selecting the preset number of fine granularity categories sequenced in front as the category identification result of the image to be identified.

6. A training device for a hybrid granularity object recognition model, the device comprising:

the model adjustment module is used for adjusting the fine granularity branch classification module of the pre-training model by taking the characteristic difference among the enlarged fine granularity categories as a target to obtain a mixed granularity object recognition model;

the model adjustment module is specifically configured to:

7. A mixed grain object recognition device, the device comprising:

8. An electronic device comprising a processor and a memory, wherein the memory stores at least one instruction, at least one program, a set of codes, or a set of instructions, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by the processor to implement the mixed-granularity object recognition model training method of any one of claims 1-3 or the mixed-granularity object recognition method of any one of claims 4-5.

9. A computer readable storage medium having stored therein at least one instruction, at least one program, code set, or instruction set, the at least one instruction, the at least one program, the code set, or instruction set being loaded and executed by a processor to implement the mixed-granularity object recognition model training method of any one of claims 1-3 or the mixed-granularity object recognition method of any one of claims 4-5.