CN110929624B - Construction method of multi-task classification network based on orthogonal loss function - Google Patents

Construction method of multi-task classification network based on orthogonal loss function Download PDF

Info

Publication number
CN110929624B
CN110929624B CN201911124037.4A CN201911124037A CN110929624B CN 110929624 B CN110929624 B CN 110929624B CN 201911124037 A CN201911124037 A CN 201911124037A CN 110929624 B CN110929624 B CN 110929624B
Authority
CN
China
Prior art keywords
classification
classifier
loss function
network
fine
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911124037.4A
Other languages
Chinese (zh)
Other versions
CN110929624A (en
Inventor
何贵青
敖振
霍胤丞
纪佳琪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN201911124037.4A priority Critical patent/CN110929624B/en
Publication of CN110929624A publication Critical patent/CN110929624A/en
Application granted granted Critical
Publication of CN110929624B publication Critical patent/CN110929624B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Abstract

The invention provides a construction method of a multitask classification network based on an orthogonal loss function, wherein the constructed multitask classification network simulates the human learning process, a deep convolutional neural network is used as a hidden layer to simulate the human brain to carry out deep feature extraction, a tree classifier is used as a task-related output layer to carry out progressive classification, and the identification process forms different learning tasks. The invention enables the features obtained by different tasks to better meet respective requirements, enables the depth features of the same coarse class to be more aggregated when the classifier completes the coarse classification task, and enables the depth features of different fine classes to be more discrete when the fine classification task is completed, and distinguishes the task output layer features of different classification tasks, so that the classifiers of different levels can obtain the features which are more matched with different classification tasks, and the useless features are removed, thereby improving the classification accuracy.

Description

Construction method of multi-task classification network based on orthogonal loss function
Technical Field
The invention relates to the field of image classification, in particular to a multitask classification construction method.
Background
In recent years, photographing and image recognition are more and more widely applied to field exploration and daily life. This benefits from the development of deep learning. The tool that currently performs best in feature extraction is the deep convolutional neural network. As is known, the deep convolutional neural network can not only extract edge information in a shallow layer, but also obtain features closer to human cognitive behaviors as semantic information of the features becomes more abstract as the number of layers deepens.
Subsequently, the multitasking classification network gradually comes into sight of people. The different tasks of the multi-task classification network are mutually assisted, the different tasks of the same network are trained simultaneously, and each different task has an independent loss function. According to the learning experience of human beings, the process of identifying thousands of targets in the world is a progressive process from easy to difficult. In childhood, humans can only recognize a rough classification of all objects, e.g., birds, cars, plants. As the brain system matures, it may be possible to have contact with species that become increasingly finer during learning or life, for example, birds with parrots, sparrows, etc., and cars or buses in cars. The multitask network regards the identification of the coarse class as a first-level task, and a plurality of subtasks for identifying the fine class exist under the first-level task. Thus, knowledge learned earlier in the task of identifying coarse classes can also be used in the new task of identifying fine classes.
However, the structure for classification in the human brain is not simple as expected, and when a multi-task classification network is used for multi-task classification, the hidden layer parameters are learned together by using each loss function of the multi-task, so that the extracted features are also used for multi-classification task multiplexing. But the features they require are not completely consistent for different classification tasks. For example, in a coarse classification, the information of the steering wheel and the wheels may help the network to effectively identify the target to the category of the car. However, when sub-categorizing, these common features may favor the SUV and the charter into the same category, and the task of identifying the sub-category is to separate the two categories, so it is desirable that the network pay more attention to their characteristic features, such as appearance, shape. However, due to the sharing of hidden layer parameters, the extracted features are crossed, so that the multi-task classification network lacks of distinguishing the extracted common features from the extracted unique features in the process of executing different classification tasks.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a construction method of a multitask classification network based on an orthogonal loss function. The invention provides an orthogonal loss function to distinguish common characteristics and special characteristics under a multi-task classification network model, and the loss function measures loss by using the spatial similarity of the characteristics, so that the characteristics obtained by each classification task better meet the requirements of the respective classification task, the separability of the characteristics is improved, and the interference of useless characteristics to the classification tasks is effectively inhibited. The multi-task classification network constructed by the invention simulates the human learning process, uses the deep convolutional neural network as a hidden layer to simulate the human brain to carry out deep feature extraction, uses the tree classifier as a task-related output layer to carry out progressive classification, and forms different learning tasks from the difficult, easy and mutually related identification processes. The orthogonal loss function can guide the multi-task classification network to distinguish the features of different levels of the tree classifier during back propagation, so that the network can learn the features which are more consistent with respective tasks according to different task requirements in the parameter updating process.
The technical scheme adopted by the invention for solving the technical problem comprises the following steps:
the method comprises the following steps: building a hierarchical tag tree
The hierarchical label tree is divided into two layers, wherein the first layer of labels are rough labels of the images and are divided according to the species to which the images belong, and the second layer of labels are fine labels of the images and are defined according to the subclass of the species to which the images belong;
step two: constructing a deep convolutional neural network as a feature extraction module;
the deep convolutional neural network is selected to extract image depth features for Resnet-18 with a residual structure, and the image depth features comprise 18 layer network structures which comprise 17 convolutional layers and a full-connection layer. Except that the first convolution layer uses convolution kernels of 7 × 7, the other convolution layers all use convolution kernels of 3 × 3, wherein every two convolution layers form a residual block, identity mapping is added, the network requires an image RGB three-dimensional pixel value with an input dimension of 3 × 224, a feature vector output dimension of 512 × 7 is obtained after operation of 17 convolution layers, the output dimension of the feature vector input to the full-connection layer is set to be 1024, and therefore the image obtains a one-dimensional vector containing 1024 neurons through Resnet-18, namely the depth feature extracted by the depth convolution neural network;
step three: building a tree classifier for classification;
constructing a corresponding tree classifier according to the hierarchical label tree of each database in the first step, wherein the structure of the tree classifier is two layers, and the first layer comprises a coarse classifier and is used for a coarse classification task; the second layer comprises N fine classifiers for fine classification tasks, the sub-classifiers comprise a coarse classifier and N fine classifiers, the sub-classifiers have the same network structure and are mutually independent, each sub-classifier comprises two full-connection layers and a softmax classifier, and each sub-classifier obtains a classification result; inputting the depth features obtained in the second step into each sub-classifier for classification, wherein the image obtained by the coarse classifier belongs to the nth coarse class, and the nth fine classifier is selected for fine classification to obtain which fine class the image belongs to according to the classification result of the coarse classifier and the membership defined by the hierarchical label tree; therefore, after the multitask network including the deep convolutional neural network and the tree classifier is built, when an image is input into the multitask network, the image can be classified, and an image classification result is obtained;
step four: constructing a quadrature loss function
Combining a deep convolutional neural network with a tree classifier, building a multi-task classification network, inputting a training set image to train the multi-task network, and needing to construct a loss function to update parameters;
firstly, constructing an orthogonal loss function to update parameters, wherein an expected result realized by adopting orthogonal loss is that a coarse classifier feature vector and a fine classifier feature vector are orthogonal in space, so that a cross feature vector is 0 in an ideal state, adding a target to be completed by feature selection into the loss function provides an orthogonal loss, and the orthogonal loss function formula is constructed as follows:
Figure BDA0002276261590000031
where x is the pixel value of the input N images, k represents the number of coarse classes, f1,f2,......,fkRepresenting k sub-classification tasks, Tr representing the trace of the matrix, T representing the matrix transposition, fg(x) Coarse classifier features representing the N images obtained, fs(x) A sub-classifier feature representing N images,
Figure BDA0002276261590000032
the trace represents the corresponding inner products of the coarse classifier characteristic and the fine classifier characteristic of the N images, and then the inner products are added and summed when the inner products are obtained
Figure BDA0002276261590000033
When the number is equal to 0, the vectors of each row of the coarse classifier features and the fine classifier features are orthogonal, and alpha is a hyperparameter;
when a training set image is input to train the multi-task network, the orthogonal loss function is used for carrying out reverse propagation, and the value of the orthogonal loss function is reduced by updating parameters; when the quadrature loss function approaches 0 infinitely, fg(x) And fs(x) Tend to be orthogonal;
step five: constructing a classification loss function;
in the process fg(x) And fs(x) When the neural network is transmitted to the next full connection layer and the softmax classifier for classification, the output of a plurality of neurons is mapped to a (0, 1) interval through the softmax classifier, a rough classification predicted value and a fine classification predicted value are respectively obtained, then the error between the predicted values and the real label value is measured by using a cross entropy loss function, and the cross entropy loss function formula is as follows:
Figure BDA0002276261590000041
wherein g represents a coarse class, s represents a fine class, X represents a depth feature obtained by the step two of the input image, Wg and Ws represent weight values in a coarse classifier and a fine classifier respectively, and bsAnd bgRespectively representing the bias in the coarse classifier and the fine classifier, and when the cross entropy loss function is infinitely close to 0, the predicted value is infinitely close to the true value;
step six: and adding the orthogonal loss function in the step four and the cross entropy loss function in the step five, updating the network parameters through back propagation, continuously updating the parameters by utilizing images of a training set through a random gradient descent method by using an SGD optimizer through the back propagation, and adding a test set into each training to perform testing.
The invention has the advantages that the common characteristics and the characteristic characteristics are distinguished by providing the orthogonal loss function under the multi-task classification network model, so that the characteristics obtained by different tasks can better meet the respective requirements, the depth characteristics of the same coarse class are more aggregated when the classifier completes the coarse classification task, and the depth characteristics of different fine classes are more discrete when the fine classification task is completed. In addition, different threshold values are selected as balance points for measuring the orthogonal loss and the cross entropy loss, and the optimal solution of the threshold values is found through experimental comparison. After the optimal threshold solution is obtained, an orthogonal loss function is added to update network parameters, so that the characteristics of task output layers of different classification tasks are distinguished, classifiers of different levels are enabled to obtain characteristics which are more matched with the different classification tasks, useless characteristics are removed, and the classification accuracy is improved. The invention obtains better classification effect on two different databases.
Drawings
FIG. 1 is a structural diagram of a Fashion-60 database hierarchical tag tree in accordance with the present invention.
FIG. 2 is a structural diagram of a hierarchical tag tree of Caltech-UCSD libraries-200 and 2011 database in the present invention.
FIG. 3 is a schematic diagram of the selection of features in three-dimensional space by orthogonal loss functions in the present invention.
Fig. 4 is a schematic diagram of the overall structure of the multitask classification network according to the present invention.
Fig. 5 is a schematic diagram of a method for implementing a multi-task classification network with an orthogonal loss function added in the present invention.
FIG. 6 is a bar graph illustrating network identification accuracy for different Fashion-60 thresholds in accordance with the present invention.
FIG. 7 is a bar chart showing the network identification accuracy under different thresholds for Caltech-UCSD libraries-200 and 2011 in the present invention.
Detailed Description
The invention is further illustrated with reference to the following figures and examples.
The method comprises the following steps: constructing a hierarchical label tree;
because there is visual similarity between species belonging to the same ethnicity, hierarchical label trees are constructed for different databases using the dependencies between species in nature. The hierarchical label tree is divided into two layers, wherein the first layer of labels are rough labels of the images and are divided according to the species to which the images belong, and the second layer of labels are fine labels of the images and are defined according to the subclass of the species to which the images belong. Branches of the label tree represent dependencies. Caltech-UCSD libraries-200-. For example, hummingbirds are identified as a rough genus of birds, which are the primary labels of the samples. Under this genus, there are numerous types of hummingbirds as thin classes, which are the secondary labels of the sample, i.e., the original database labels. And the dependency relationship between the two levels of samples is used as a branch of the label tree to define the association between the two levels of samples, and the specific structure of the structure is shown in fig. 1. Similarly, the Fashinon-60 database contains 60 garment accessory categories. The construction method is to classify each of 5 kinds of clothes as a rough type according to its function with reference to common sense. For example, consider a shoe as a thick genus of the database as a first level tag of a hierarchical tag tree. The specific structure result of the slippers, leather boots and the like contained under the shoes as the thin classes is shown in figure 2, and the subordinate relations are also used as branches of the label tree;
step two: building deep convolutional neural network as feature extraction module
The deep convolutional neural network is selected to extract image depth features for Resnet-18 with a residual structure, and the image depth features comprise 18 layer network structures which comprise 17 convolutional layers and a full-connection layer. Except that the first convolution layer uses convolution kernels of 7 × 7, the other convolution layers all use convolution kernels of 3 × 3, wherein every two convolution layers form a residual block, identity mapping is added, the network requires an image RGB three-dimensional pixel value with an input dimension of 3 × 224, a feature vector output dimension of 512 × 7 is obtained after operation of 17 convolution layers, the output dimension of the feature vector input to a full-connection layer is set to be 1024, and therefore a one-dimensional vector containing 1024 neurons is obtained from the image through Resnet-18 and is the depth feature extracted by the depth convolution neural network.
Step three: building tree classifier for classification
Constructing a corresponding tree classifier according to the hierarchical label tree of each database in the first step, wherein the structure of the tree classifier is two layers, and the first layer comprises a coarse classifier and is used for a coarse classification task; the second layer comprises N fine classifiers (N coarse classes are assumed) for fine classification tasks, the sub-classifiers (one coarse classifier and the N fine classifiers) have the same network structure and are independent of each other, each sub-classifier comprises two full-connection layers and one softmax classifier, and each sub-classifier obtains a classification result; inputting the depth features obtained in the second step into each sub-classifier for classification, wherein the image obtained by the coarse classifier belongs to the nth coarse class, and the nth fine classifier is selected for fine classification to obtain which fine class the image belongs to according to the classification result of the coarse classifier and the membership defined by the hierarchical label tree; thus, the establishment of the multitask network including the deep convolutional neural network and the tree classifier is completed, as shown in fig. 5; when an image is input into the multitask network, image classification can be carried out to obtain an image classification result;
step four: constructing a quadrature loss function
Combining a deep convolutional neural network with a tree classifier, building a multi-task classification network, inputting a training set image to train the multi-task network, and needing to construct a loss function to update parameters;
firstly, an orthogonal loss function is constructed to update parameters, the purpose of feature selection is realized, and the expected result realized by adopting orthogonal loss is that the feature vectors of the coarse classifier and the feature vectors of the fine classifier are orthogonal in space, so that the cross feature vector is 0 in an ideal state, as shown in fig. 3. The invention adds the target to be completed by feature selection into a loss function and provides an orthogonal loss, and a structural formula is as follows:
Figure BDA0002276261590000061
where x is the pixel value of the input N images, k represents the number of coarse classes, f1,f2,......,fkRepresenting k sub-classification tasks (only one sub-classifier structure is shown in fig. 4), Tr representing the trace of the matrix, T representing the matrix transposition, fg(x) Coarse classifier features representing the N images obtained, fs(x) A sub-classifier feature representing N images,
Figure BDA0002276261590000062
the trace represents the corresponding inner products of the coarse classifier characteristic and the fine classifier characteristic of the N images, and then the inner products are added and summed when the inner products are obtained
Figure BDA0002276261590000063
When the sum of the vectors is 0, the vectors of each row of the coarse classifier characteristic and the fine classifier characteristic are orthogonal, alpha is a hyperparameter (the optimal value of a bird database is 2, and the optimal value of a fast-60 database is 2.5), and the size of the alpha represents the influence of the orthogonal loss on the whole network parameter in the process of back propagation.
When a training set image is input to train the multi-task network, the orthogonal loss function is used for carrying out reverse propagation, and the value of the orthogonal loss function is reduced by updating parameters; when the quadrature loss function approaches 0 infinitely, fg(x) And fs(x) Tending to be orthogonal. The active position is shown in figure 4.
Step five: constructing a classification loss function
In the process fg(x) And fs(x) When the output of the neuron is transmitted to the next full-link layer and the softmax classifier for classification, the output of a plurality of neurons is mapped to (0,1) within the interval, respectively obtaining a rough classification predicted value and a fine classification predicted value, and then measuring the error between the predicted value and the real label value by using a cross entropy loss function, wherein the formula is as follows:
Figure BDA0002276261590000071
wherein g represents a coarse class, s represents a fine class, X represents a depth feature obtained by the step two of the input image, Wg and Ws represent weight values in a coarse classifier and a fine classifier respectively, and bsAnd bgRepresenting the bias in the coarse and fine classifiers, respectively, the predicted values approach the true values infinitely when the cross-entropy loss function approaches 0 infinitely. The action position is shown in figure 4;
step six: and adding the orthogonal loss function in the step four and the cross entropy loss function in the step five, updating the network parameters through back propagation, continuously updating the parameters by utilizing images of a training set through a random gradient descent method by using an SGD optimizer through the back propagation, and adding a test set into each training to perform testing.
The present invention uses the Resnet network as a hidden layer for feature extraction, and then transfers the features to a tree classifier for multi-task classification. Although the multitask classification network effectively solves the interference of the similarity among classes to the network, the multitask classification network also has some defects. When multi-task classification is carried out, the characteristics required by each classification task are different, but because the parameters of each task in the model are shared during characteristic extraction, the characteristics required by each task are mixed together, and therefore, redundant characteristics required by tasks other than the tasks possibly interfere with the recognition performance of the classifier. Therefore, the orthogonal loss function is provided for the multi-task classification network, the feature selection is realized by measuring the spatial distance of the depth features of the classifiers of different levels in the tree classifier, so that the features obtained by each layer of classifier can better meet the requirements of respective classification tasks, and the defects of the existing model are overcome.
According to fig. 3, the present embodiment proposes a multitask classification network function based on an orthogonal loss function, which includes the following four parts: (1) organizing a large number of picture labels by utilizing the subordination relation among the species to construct a complete hierarchical label number; (2) extracting depth features of different pictures by using a depth convolutional neural network, so that the depth features extracted by the network can be simultaneously used for a multi-task classifier; (3) and replacing the traditional N-way softmax classifier by using the tree classifier to realize multi-task classification. (4) And constructing an orthogonal loss function to measure the depth features of classifiers in different layers, and distinguishing the two types of features by increasing the spatial distance between the common features and the specific features and deleting the cross features.
Compared with the traditional deep learning network model, the example model has two advantages: firstly, the embodiment constructs a hierarchical label tree by utilizing the subordination relation among species so as to realize multi-task classification, pays attention to that the discrimination degrees among the categories are different, and progressively classifies the tasks layer by layer through the construction difficulty, so that the tasks of each hierarchy are mutually assisted, and the error gradient distribution is more uniform; secondly, the orthogonal loss function constructed in the embodiment distinguishes the depth features extracted by the depth convolution neural network, the orthogonal loss measures the space distance between the coarse classifier features and the fine classifier features, and the space distance is increased through automatic updating of network parameters, so that the proportion of cross features in a classification task is reduced, and the accuracy of the model for image recognition is improved. In the example, the influence of the orthogonal loss on the whole network parameters in the process of back propagation is controlled by setting different threshold values, so that experiments are carried out to verify the advantage of the orthogonal loss function in feature selection. Obviously, after the orthogonal loss function is added to the multitask network, the obtained characteristics better meet the requirements of various classification tasks, and the network identification precision is improved.
In order to quantitatively evaluate the effect of the orthogonal loss function on the multitask classification network, the embodiment first selects Fashion-60 to evaluate the effect of the orthogonal loss function, and the database has 60 clothing subclasses and 10 clothing classes divided according to common sense of life as rough classes. The present embodiment will be evaluated from two aspects: (1) the influence of the orthogonal loss function on the whole network parameters in the process of back propagation is changed by adjusting the size of the threshold alpha, and how to reach the balance of the orthogonal loss and the cross entropy loss is evaluated. (2) An orthogonal loss function is used for updating parameters in the multi-task classification network, and whether the performance is better than that of a traditional multi-task classification network structure is evaluated; in the network structure of the embodiment, the CNN uses Resnet to perform feature extraction, and has 18 layers in total.
In the experimental process, the network has different settings for the threshold α, so as to select a proper α for training. Therefore, the present embodiment selects 14 different α values from 0.0001 to 6. The results of the experiment are shown in FIG. 4. As can be seen from the figure, the network performance is best when α is 2.5. When the influence factor is small, the influence of the orthogonality loss function on the network is insignificant. As the value increases, the performance of the network will gradually decrease, which means that when the value is too large, the effect of the orthogonality loss function will increase, which affects the original performance of the network and plays a negative role. Therefore, α ═ 2.5 is selected to train the network and compare it to the reference network.
Table one: classification accuracy of networks
Figure BDA0002276261590000081
As can be seen from table one, compared with the conventional deep convolutional neural network and the multitask network without the orthogonal loss, the accuracy of the classification of the multitask network based on the orthogonal loss is obviously better than that of the other two types of networks. The result proves that the orthogonality loss function provided by the invention effectively completes the feature selection, so that the features obtained in the multitask network are more in line with the task requirements.
Similarly, in the embodiment, 14 different alpha values are selected from 0.0001-6 to be used for evaluating the results of the Caltech-UCSD libraries-200-. The results of the experiment are shown in FIG. 5. As can be seen from the figure, the network performance is best when α is 2. When the influence factor α is small, the orthogonality loss function is unstable for the network. And as the value gradually increases beyond a certain range, network performance will gradually decrease. The optimal value of α is different compared to the experimental results on fast-60, but the overall trend of α towards network performance is the same for both types of databases. Further experimental results show that the value of α may be different in different databases, but the effect of α on network performance is regular. Thus, for a value of α, too small will not work, while too large a value will work adversely. Thus taking the median value, the balance between the quadrature loss and the softmax loss can only be found.
Table two: classification accuracy of networks
Methods Basic architecture Fine-classes Coarse-classes
CNN Alexnet 67.683% --
CNN VGG-19 68.816% --
CNN Resnet-18 70.094% --
Multi-task network Resnet-18+tree classifier(baseline) 72.491% 96.303%
Multi-task network Resnet-18+tree classifier+Orthogonality Loss 73.399% 96.842%
From the data in Table two, it can be seen that the orthogonal loss based multitask network is also significantly better than the other two types of networks in Caltech-UCSD Birds-200-. This further demonstrates that the orthogonality loss function proposed by the present invention effectively accomplishes feature selection, making the features obtained in a multitasking network more discriminative.
By providing an orthogonal loss function for distinguishing common features from characteristic features under a multi-task classification network model, the features obtained by different tasks can better meet respective requirements, the same coarse-class features are more aggregated when a classifier completes a coarse classification task, and different fine-class features are more discrete when a fine classification task is completed. In addition, the method selects different threshold values as balance points for measuring the orthogonal loss and the cross entropy loss, and finds the optimal solution of the threshold values through experimental comparison. After the optimal threshold solution is obtained, an orthogonal loss function is added to update network parameters, so that the characteristics of task output layers of different classification tasks are distinguished, classifiers of different levels are enabled to obtain characteristics which are more matched with the different classification tasks, useless characteristics are removed, and the classification accuracy is improved. The invention obtains better classification effect on two different databases.
The foregoing illustrates and describes the principles, general features, and advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (1)

1. A construction method of a multitask classification network based on an orthogonal loss function is characterized by comprising the following steps:
the method comprises the following steps: constructing a hierarchical label tree;
the hierarchical label tree is divided into two layers, wherein the first layer of labels are rough labels of the images and are divided according to the species to which the images belong, and the second layer of labels are fine labels of the images and are defined according to the subclass of the species to which the images belong;
step two: constructing a deep convolutional neural network as a feature extraction module;
selecting a deep convolutional neural network to extract image depth features for Resnet-18 with a residual error structure, wherein the image depth features comprise 18 layer network structures which comprise 17 convolutional layers and a full connection layer; except that the first convolution layer uses convolution kernels of 7 × 7, the other convolution layers all use convolution kernels of 3 × 3, wherein every two convolution layers form a residual block, identity mapping is added, the network requires an image RGB three-dimensional pixel value with an input dimension of 3 × 224, a feature vector output dimension of 512 × 7 is obtained after operation of 17 convolution layers, the feature vector is input into a full-connection layer, the output dimension of the full-connection layer is 1024, and therefore, a one-dimensional vector containing 1024 neurons is obtained by the image through Resnet-18, namely the depth feature extracted by the depth convolution neural network;
step three: building a tree classifier for classification;
constructing a corresponding tree classifier according to the middle-level label tree in the first step, wherein the tree classifier has a two-layer structure, and the first layer comprises a coarse classifier and is used for a coarse classification task; the second layer comprises N fine classifiers for fine classification tasks, the sub-classifiers comprise a coarse classifier and N fine classifiers, the sub-classifiers have the same network structure and are mutually independent, each sub-classifier comprises two full-connection layers and a softmax classifier, and each sub-classifier obtains a classification result; inputting the depth features obtained in the second step into each sub-classifier for classification, wherein the image obtained by the coarse classifier belongs to the nth coarse class, and the nth fine classifier is selected for fine classification to obtain which fine class the image belongs to according to the classification result of the coarse classifier and the membership defined by the hierarchical label tree; therefore, after the multitask classification network including the deep convolutional neural network and the tree classifier is built, when an image is input into the multitask classification network, image classification can be carried out to obtain an image classification result;
step four: constructing a quadrature loss function
Combining a deep convolutional neural network with a tree classifier, building a multi-task classification network, inputting a training set image to train the multi-task classification network, and constructing a loss function to update parameters;
firstly, constructing an orthogonal loss function to update parameters, wherein an expected result realized by adopting orthogonal loss is that a coarse classifier feature vector and a fine classifier feature vector are orthogonal in space, so that a cross feature vector is 0 in an ideal state, adding a target to be completed by feature selection into the loss function provides an orthogonal loss, and the orthogonal loss function formula is constructed as follows:
Figure FDA0003108773770000021
where x is the pixel value of the input N images, k represents the number of coarse classes, f1,f2,......,fkRepresenting k sub-classification tasks, Tr representing the trace of the matrix, T representing the matrix transposition, fg(x) Coarse classifier features representing the N images obtained, fs(x) A sub-classifier feature representing N images,
Figure FDA0003108773770000022
the trace represents the corresponding inner products of the coarse classifier characteristic and the fine classifier characteristic of the N images, and then the inner products are added and summed when the inner products are obtained
Figure FDA0003108773770000023
When the number is equal to 0, the vectors of each row of the coarse classifier features and the fine classifier features are orthogonal, and alpha is a hyperparameter;
when a training set image is input to train the multi-task classification network, the orthogonal loss function is used for carrying out reverse propagation, and the value of the orthogonal loss function is reduced by updating parameters; when the quadrature loss function approaches 0 infinitely, fg(x) And fs(x) Tend to be orthogonal;
step five: constructing a classification loss function;
in the process fg(x) And fs(x) When the neural network is transmitted to the next full connection layer and the softmax classifier for classification, the output of a plurality of neurons is mapped to a (0, 1) interval through the softmax classifier, a rough classification predicted value and a fine classification predicted value are respectively obtained, then the error between the predicted values and the real label value is measured by using a cross entropy loss function, and the cross entropy loss function formula is as follows:
Figure FDA0003108773770000024
wherein g represents a coarse class, s represents a fine class, X represents a depth feature of the input image obtained through the second step, WgAnd WsRepresenting weight values in the coarse and fine classifiers, respectively, bgAnd bsRespectively representing the bias in the coarse classifier and the fine classifier, and when the cross entropy loss function is infinitely close to 0, the predicted value is infinitely close to the real label value;
step six: and adding the orthogonal loss function in the step four and the cross entropy loss function in the step five, updating the network parameters through back propagation, continuously updating the parameters by utilizing images of a training set through a random gradient descent method by using an SGD optimizer through the back propagation, and adding a test set into each round of training for testing.
CN201911124037.4A 2019-11-18 2019-11-18 Construction method of multi-task classification network based on orthogonal loss function Active CN110929624B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911124037.4A CN110929624B (en) 2019-11-18 2019-11-18 Construction method of multi-task classification network based on orthogonal loss function

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911124037.4A CN110929624B (en) 2019-11-18 2019-11-18 Construction method of multi-task classification network based on orthogonal loss function

Publications (2)

Publication Number Publication Date
CN110929624A CN110929624A (en) 2020-03-27
CN110929624B true CN110929624B (en) 2021-09-14

Family

ID=69853358

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911124037.4A Active CN110929624B (en) 2019-11-18 2019-11-18 Construction method of multi-task classification network based on orthogonal loss function

Country Status (1)

Country Link
CN (1) CN110929624B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111507403A (en) * 2020-04-17 2020-08-07 腾讯科技(深圳)有限公司 Image classification method and device, computer equipment and storage medium
CN111798520B (en) * 2020-09-08 2020-12-22 平安国际智慧城市科技股份有限公司 Image processing method, device, equipment and medium based on convolutional neural network
CN112784776B (en) * 2021-01-26 2022-07-08 山西三友和智慧信息技术股份有限公司 BPD facial emotion recognition method based on improved residual error network
CN112924177B (en) * 2021-04-02 2022-07-19 哈尔滨理工大学 Rolling bearing fault diagnosis method for improved deep Q network
CN113408852B (en) * 2021-05-18 2022-04-19 江西师范大学 Meta-cognition ability evaluation model based on online learning behavior and deep neural network

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109919177A (en) * 2019-01-23 2019-06-21 西北工业大学 Feature selection approach based on stratification depth network
CN109992703A (en) * 2019-01-28 2019-07-09 西安交通大学 A kind of credibility evaluation method of the differentiation feature mining based on multi-task learning
CN110046668A (en) * 2019-04-22 2019-07-23 中国科学技术大学 A kind of high performance multiple domain image classification method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109919177A (en) * 2019-01-23 2019-06-21 西北工业大学 Feature selection approach based on stratification depth network
CN109992703A (en) * 2019-01-28 2019-07-09 西安交通大学 A kind of credibility evaluation method of the differentiation feature mining based on multi-task learning
CN110046668A (en) * 2019-04-22 2019-07-23 中国科学技术大学 A kind of high performance multiple domain image classification method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Learning Multi-Domain Adversarial Neural Networks for Text Classification;XIAO DING et al;《IEEE Access》;20190409;第7卷;第40323-40332页 *
Multi-Task Networks With Universe, Group, and Task Feature Learning;Shiva Pentyala et al;《arXiv:1907.01791v1 [cs.CL]》;20190703;第1-11页 *
Study on Algorithm Evaluation of Image Fusion Based on Multi-hierarchical Synthetic Analysis;Guiqing He et al;《ICSPCC2016》;20161231;第1-6页 *
短文本用户评论的分类系统设计与实现;李军炜;《中国优秀硕士学位论文全文数据库信息科技辑》;20181115(第11期);第I138-613页 *

Also Published As

Publication number Publication date
CN110929624A (en) 2020-03-27

Similar Documents

Publication Publication Date Title
CN110929624B (en) Construction method of multi-task classification network based on orthogonal loss function
CN108664924B (en) Multi-label object identification method based on convolutional neural network
Mathur et al. Crosspooled FishNet: transfer learning based fish species classification model
Çinar et al. Classification of raisin grains using machine vision and artificial intelligence methods
CN110659958B (en) Clothing matching generation method based on generation of countermeasure network
CN105809672B (en) A kind of image multiple target collaboration dividing method constrained based on super-pixel and structuring
CN106446933A (en) Multi-target detection method based on context information
Schwartz et al. Automatically discovering local visual material attributes
Sathiyanarayanan et al. Identification of breast cancer using the decision tree algorithm
CN112132014B (en) Target re-identification method and system based on non-supervised pyramid similarity learning
CN112733602B (en) Relation-guided pedestrian attribute identification method
Zhong et al. A comparative study of image classification algorithms for Foraminifera identification
CN112784921A (en) Task attention guided small sample image complementary learning classification algorithm
Lu et al. Crowdsourcing evaluation of saliency-based XAI methods
Balasubramaniyan et al. Color contour texture based peanut classification using deep spread spectral features classification model for assortment identification
Su et al. A CNN-LSVM model for imbalanced images identification of wheat leaf
Lubis et al. KNN method on credit risk classification with binary particle swarm optimization based feature selection
CN111127485B (en) Method, device and equipment for extracting target area in CT image
Jodas et al. Deep Learning Semantic Segmentation Models for Detecting the Tree Crown Foliage.
Chuntama et al. Classification of astronomical objects in the galaxy m81 using machine learning techniques ii. an application of clustering in data pre-processing
Jerandu et al. Image Classification of Decapterus Macarellus Using Ridge Regression
Rame et al. CORE: Color regression for multiple colors fashion garments
Basinet et al. Performance of two multiscale texture algorithms in classifying silver gelatin paper via k-nearest neighbors
Kumar et al. Image classification in python using Keras
Anggoro et al. Classification of Solo Batik patterns using deep learning convolutional neural networks algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant