CN109978074A - Image aesthetic feeling and emotion joint classification method and system based on depth multi-task learning - Google Patents

Image aesthetic feeling and emotion joint classification method and system based on depth multi-task learning Download PDF

Info

Publication number
CN109978074A
CN109978074A CN201910272826.6A CN201910272826A CN109978074A CN 109978074 A CN109978074 A CN 109978074A CN 201910272826 A CN201910272826 A CN 201910272826A CN 109978074 A CN109978074 A CN 109978074A
Authority
CN
China
Prior art keywords
image
classification
aesthetic feeling
depth
neural networks
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910272826.6A
Other languages
Chinese (zh)
Inventor
崔超然
余俊
杨文雅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University of Finance and Economics
Original Assignee
Shandong University of Finance and Economics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University of Finance and Economics filed Critical Shandong University of Finance and Economics
Priority to CN201910272826.6A priority Critical patent/CN109978074A/en
Publication of CN109978074A publication Critical patent/CN109978074A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

Present disclose provides a kind of image aesthetic feeling based on depth multi-task learning and emotion joint classification method and system.Wherein, which includes: the corresponding aesthetic feeling classification of mark image and emotional category, forms training dataset;Construction includes the depth convolutional neural networks of across branch articulamentum and Liang Ge parallel network branch;Depth convolutional neural networks are trained using training dataset, until predefined loss function reaches minimum;The depth convolutional neural networks output given image obtained using training belongs to the probability of each aesthetic feeling classification and each emotional category, chooses prediction aesthetic feeling classification and emotional category of the classification respectively as given image of maximum probability in aesthetic feeling classification and emotional category.

Description

Image aesthetic feeling and emotion joint classification method and system based on depth multi-task learning
Technical field
The disclosure belongs to technical field of computer vision more particularly to a kind of image aesthetic feeling based on depth multi-task learning With emotion joint classification method and system.
Background technique
Only there is provided background technical informations relevant to the disclosure for the statement of this part, it is not necessary to so constitute first skill Art.
With the fast development of computer vision technique, people are not intended merely to computer capacity enough in semantic level to image Content is analyzed, it more desirable to which computer can simulate human vision and thought system, generate higher level sensing capability.Make Two representative tasks in research are understood for perception, and the aesthetic feeling classification of image and emotional semantic classification are respectively intended to make computer can be with The aesthetic and emotional responses that the identification mankind are generated by being stimulated by image vision.Currently, the aesthetic feeling classification of image and emotion point Class technology has been applied in the storage of image, editor, retrieval etc..For example, for user's shooting about same object Or multiple candidate photos of scene, the works of screening most aesthetic feeling are saved and are shown, reasonably reduce the storage overhead of data;? In the creation and editor of image artifacts, the aesthetic quality of analysis comparison candidate scheme promotes the visual sense of beauty of works;It is examined in image In cable system, the Sentiment orientation for returning to image is considered, provide semantic accurate and more infectious search result for user.
It is automatic to realize to the aesthetic feeling classification of image and emotion due to the diversity of picture material and the complexity of human perception Classification is challenging task.In recent years, have benefited from the large-scale image number with aesthetic feeling label and emotion label According to the appearance of collection, the method based on machine learning is widely adopted.The core procedure of method is to extract to have in classification task The Image Visual Feature of good discrimination ability.The method of early stage relies primarily on the feature of engineer, needs researcher to problem Itself there is deep understanding.As deep learning is in the rise of computer vision field, recent method mainly utilizes convolution refreshing Through network, automatically extraction feature is used for image aesthetic feeling and emotional semantic classification, and obtains preferable effect.
Inventors have found that aesthetic feeling classification and emotional semantic classification of the prior art usually by image are as two mutually independent Business.But instinctively, the aesthetic feeling impression and emotion impression of the mankind is not isolated appearance;On the contrary, in psychological cognition level, it Should be interrelated and interactional.For example, if piece image can make the pleasure of people's acquisition aesthetically, it It is likely to arouse the positive emotion of observer.The research of neuroscience field also turns out that the aesthetic experience of the mankind is a kind of The cognitive process constantly upgraded along with affective state, vice versa.
Summary of the invention
To solve the above-mentioned problems, the first aspect of the disclosure provides a kind of image beauty based on depth multi-task learning Sense and emotion joint classification method, by unified depth convolutional neural networks frame, making can be effective between two tasks Ground shared information realizes aesthetic feeling classification and emotional category the joint identification to image and identification accuracy and efficiency.
To achieve the goals above, the disclosure adopts the following technical scheme that
A kind of image aesthetic feeling and emotion joint classification method based on depth multi-task learning, comprising:
The corresponding aesthetic feeling classification of image and emotional category are marked, training dataset is formed;
Construction includes the depth convolutional neural networks of across branch articulamentum and Liang Ge parallel network branch;
Wherein, two network branches are each responsible for carrying out aesthetic feeling classification and emotional semantic classification to input picture;Across branch connection Layer is for connecting corresponding convolutional layer group in two network branches, to be associated with aesthetic feeling classification and the two tasks of emotional semantic classification;It is deep The output representing input images of degree convolutional neural networks belong to the probability of each aesthetic feeling classification and each emotional category;
Depth convolutional neural networks are trained using training dataset, until predefined loss function reaches minimum;
The depth convolutional neural networks output given image obtained using training belongs to each aesthetic feeling classification and each emotional category Probability, choose the classification of maximum probability in aesthetic feeling classification and emotional category respectively as the prediction aesthetic feeling classification of given image and Emotional category.
To solve the above-mentioned problems, the second aspect of the disclosure provides a kind of image beauty based on depth multi-task learning Sense and emotion joint classification system, by unified depth convolutional neural networks frame, making can be effective between two tasks Ground shared information realizes aesthetic feeling classification and emotional category the joint identification to image and identification accuracy and efficiency.
To achieve the goals above, the disclosure adopts the following technical scheme that
A kind of image aesthetic feeling and emotion joint classification system based on depth multi-task learning, comprising:
Training dataset forms module, is used to mark the corresponding aesthetic feeling classification of image and emotional category, forms training number According to collection;
Depth convolutional neural networks constructing module is used to construct comprising across branch articulamentum and Liang Ge parallel network branch Depth convolutional neural networks;
Wherein, two network branches are each responsible for carrying out aesthetic feeling classification and emotional semantic classification to input picture;Across branch connection Layer is for connecting corresponding convolutional layer group in two network branches, to be associated with aesthetic feeling classification and the two tasks of emotional semantic classification;It is deep The output representing input images of degree convolutional neural networks belong to the probability of each aesthetic feeling classification and each emotional category;
Depth convolutional neural networks training module is used to train depth convolutional neural networks using training dataset, Until predefined loss function reaches minimum;
Predict categorization module, the depth convolutional neural networks output given image for being used to obtain using training belongs to each beauty Feel the probability of classification and each emotional category, chooses the classification of maximum probability in aesthetic feeling classification and emotional category respectively as given figure The prediction aesthetic feeling classification and emotional category of picture.
To solve the above-mentioned problems, a kind of computer readable storage medium is provided in terms of the third of the disclosure, passed through Unified depth convolutional neural networks frame makes that information can be effectivelyd share between two tasks, realizes the aesthetic feeling to image Classification and the identification of emotional category joint and identification accuracy and efficiency.
To achieve the goals above, the disclosure adopts the following technical scheme that
A kind of computer readable storage medium, is stored thereon with computer program, realization when which is executed by processor Step in image aesthetic feeling and emotion joint classification method based on depth multi-task learning described above.
To solve the above-mentioned problems, the 4th aspect of the disclosure provides a kind of computer equipment, passes through unified depth Convolutional neural networks frame is spent, makes that information can be effectivelyd share between two tasks, realizes the aesthetic feeling classification and feelings to image Feel the identification of classification joint and identification accuracy and efficiency.
To achieve the goals above, the disclosure adopts the following technical scheme that
A kind of computer equipment can be run on a memory and on a processor including memory, processor and storage Computer program, the processor realize the image aesthetic feeling described above based on depth multi-task learning when executing described program With the step in emotion joint classification method.
The beneficial effect of the disclosure is:
The disclosure applies to the thought of multi-task learning in the aesthetic feeling classification and emotional semantic classification of image, takes full advantage of Associate feature between two tasks, and a unified depth convolutional neural networks frame is devised, by being connected across branch Layer makes can to make effectively share between two tasks in a manner of swap image characteristic pattern between network branches information, and Automatically learn which information is different task need in training process, realizes to combine the aesthetic feeling classification and emotional category of image and know Not, the accuracy of image aesthetic feeling classification and emotional category is improved.
Detailed description of the invention
The Figure of description for constituting a part of this disclosure is used to provide further understanding of the disclosure, and the disclosure is shown Meaning property embodiment and its explanation do not constitute the improper restriction to the disclosure for explaining the disclosure.
Fig. 1 is a kind of image aesthetic feeling and emotion joint classification based on depth multi-task learning that the embodiment of the present disclosure provides Method flow diagram.
Fig. 2 is the depth convolutional neural networks schematic diagram that the embodiment of the present disclosure provides.
Fig. 3 is across the branch articulamentum schematic diagram that the embodiment of the present disclosure provides.
Fig. 4 is a kind of image aesthetic feeling and emotion joint classification based on depth multi-task learning that the embodiment of the present disclosure provides System structure diagram.
Specific embodiment
The disclosure is described further with embodiment with reference to the accompanying drawing.
It is noted that following detailed description is all illustrative, it is intended to provide further instruction to the disclosure.Unless another It indicates, all technical and scientific terms used herein has usual with disclosure person of an ordinary skill in the technical field The identical meanings of understanding.
It should be noted that term used herein above is merely to describe specific embodiment, and be not intended to restricted root According to the illustrative embodiments of the disclosure.As used herein, unless the context clearly indicates otherwise, otherwise singular Also it is intended to include plural form, additionally, it should be understood that, when in the present specification using term "comprising" and/or " packet Include " when, indicate existing characteristics, step, operation, device, component and/or their combination.
The image aesthetic feeling and emotion joint classification based on depth multi-task learning that 1 pair of disclosure proposes with reference to the accompanying drawing Method elaborates.
As shown in Figure 1, a kind of image aesthetic feeling and emotion joint classification side based on depth multi-task learning of the present embodiment Method, comprising:
S101: the corresponding aesthetic feeling classification of mark image and emotional category form training dataset.
In specific implementation, in image aesthetic feeling classification problem, two class of high aesthetic feeling and low aesthetic feeling is divided the image into;In image In emotional semantic classification problem, pleasure is divided the image into, reveres, meet, excitement, indignation, detest, is frightened, total eight bases of sadness Emotional category.
Since the aesthetic feeling and emotion of people are all the very strong cognition attributes of subjectivity, there are apparent individual differences.Therefore, For the aesthetic feeling classification of image and the mark of emotional category, the strategy that same piece image is marked jointly using more people, it The classification for taking the highest classification of common recognition degree final as image afterwards.
It should be understood that in other examples, the classification of image aesthetic feeling and Image emotional semantic classification can also be divided into other classes Not, those skilled in the art can self-setting as the case may be, be not described in detail here.
S102: construction includes the depth convolutional neural networks of across branch articulamentum and Liang Ge parallel network branch;
Wherein, two network branches are each responsible for carrying out aesthetic feeling classification and emotional semantic classification to input picture;Across branch connection Layer is for connecting corresponding convolutional layer group in two network branches, to be associated with aesthetic feeling classification and the two tasks of emotional semantic classification;It is deep The output representing input images of degree convolutional neural networks belong to the probability of each aesthetic feeling classification and each emotional category.
Specifically, in the depth convolutional neural networks, convolutional layer group quantity in two network branches is identical to be n;Quantity across branch's articulamentum is n-1;I-th of across branch articulamentum is by i-th of corresponding convolutional layer in two network branches The characteristics of image figure of group output is stacked as input, and by these characteristics of image figures inputted along channel direction, by heap The characteristics of image of poststack is separately input into the corresponding convolutional layer group of i+1 in two network branches;1≤i≤n-1;N be greater than Or the positive integer equal to 2.
Depth convolutional neural networks in the present embodiment are as shown in Fig. 2.Network includes two parallel branch altogether, they connect By same width input picture, and it is each responsible for carrying out aesthetic feeling classification and emotional semantic classification to input picture.The knot of each network branches Structure is identical, is all based on VGG16 network structure (referring to Simonyan K, Zisserman A.Very deep convolutional networks for large-scale image recognition.arXiv preprint arXiv:1409.1556, 2014.).Each network branches are made of 5 convolutional layer groups, 3 full articulamentums and 1 Softmax layers.Wherein, single convolution Comprising multiple continuous convolutional layers and 1 maximum pond layer in layer group, the purpose is to extract effective characteristics of image figure.Full connection Layer carries out multiple nonlinear transformation to the characteristics of image figure of the last one convolutional layer group output, is mapped as a column vector. The dimension of vector is equal to the number of aesthetic feeling classification or emotional category, per the specific aesthetic feeling classification of one-dimensional correspondence one or emotion class Not.By final Softmax layer, which is converted into a probability value per one-dimensional, and representing input images belong to correspondence The probability of classification.Each layer of specific structure and parameter setting are referring to VGG16 network model in network branches.
Across branch articulamentum is introduced, convolutional layer group corresponding in two network branches is attached, across branch articulamentum Structure it is as shown in Fig. 3.The characteristics of image figure that across branch articulamentum exports two convolutional layer groups as input, and by they It is stacked along channel direction.Assuming that the channel number of heap prestack single image characteristic pattern is K (K is positive integer), then stack The channel number of characteristics of image figure is 2K afterwards.Then, the characteristics of image figure of heap poststack is inputted two convolution kernel sizes respectively is The convolutional layer of 1*1.The two convolutional layers all include K convolution kernel, and the step-length and edge filling size of convolution kernel are respectively 1 and 0. In this way, two convolutional layers will export new characteristics of image figure again, and the size of new characteristics of image figure is constant, channel Number reverts to K, and new characteristics of image figure is finally sent into the subsequent convolutional layer group of a network branches or full articulamentum respectively. Intuitively, across branch articulamentum makes to carry out shared information in a manner of swap image characteristic pattern between two network branches, and Facilitating model, automatically study determines which information is two tasks be respectively necessary in the training process;
In traditional depth multi-task learning method, different task is normally provided as sharing lower network layer, and Respective branch is maintained in higher network layer.Before carrying out multi-task learning training, need by virtue of experience artificial in advance Specify shared network layer in ground.This way lacks theoretical direction, for sharing the unreasonable selection side of may result in of network layer The serious downslide of method performance.It is different from the above method, the present embodiment all designs individually for different task on all-network layer Network branches, across branch articulamentum make between network branches can in a manner of swap image characteristic pattern come shared information, And automatically which information is study different task need in the training process, and then improves classification accuracy.
It should be noted that sequence between step 101 and step 102 can according to the concrete condition of those skilled in the art come Voluntarily adjustment sequence.
S103: training depth convolutional neural networks using training dataset, until predefined loss function reaches minimum.
In specific implementation, the process of depth convolutional neural networks is trained using training dataset, comprising:
The size dimension of all images of unified training dataset;
Initialize the weight of each layer of depth convolutional neural networks, predefined loss function;
Depth convolutional neural networks are trained using stochastic gradient descent algorithm, determination can make loss function minimum Network weight;And in each training iteration, one piece of fixed size image block is cut out from the random position of image, and with Certain probability carries out flip horizontal to image block.
During training depth convolutional neural networks using training dataset, firstly, by all training image scalings To the size of unified size, the present embodiment is by image zooming to 256*256 pixel;Then, the pixel for calculating training image is average Value, and make every piece image that the mean value be individually subtracted, which can make training image remove common ground, highlight training image Individual difference;Finally, it is big to cut out one piece of fixation from the random position for the image for subtracting mean value in each training iteration Small image block, and flip horizontal is carried out to image block with certain probability.In this way, training sample can effectively be expanded Quantity, the diversity of training for promotion sample.What the present embodiment was chosen is the image block of 224*224 pixel size, is carried out every time The probability of flip horizontal operation is 0.5.
In addition to the full articulamentum of the last layer and across branch articulamentum, the weight of each each layer of network branches is all made of The weight of the VGG16 model of pre-training initializes on ImageNet data set, to the full articulamentum of the last layer and across branch company The weight for connecing layer carries out random initializtion.Using cross entropy loss function, being defined on the classificatory loss of aesthetic feeling is La, in emotion Classificatory loss is Le, i.e.,
La=-ya logpa-(1-ya)log(1-pa)
Wherein, yaThe true aesthstic classification for showing input picture, if it is high artistic image, value 1 that image is practical;It is no Then, value 0.yeShow the real feelings classification of input picture, if image actually belongs to e-th of emotional category, value 1; Otherwise, value 0.paThe image for representing network output belongs to the probability of high aesthetic feeling classification, peThe image for representing network output belongs to The probability of e-th of emotional category.
Further, total loss function is L=La+ λ Le.Wherein, λ is the hyper parameter of two class of balance model loss.? In the present embodiment, consider that aesthetic feeling is classified as two classification problems, and emotional semantic classification is more classification problems, therefore the value that sets of λ is 1/4. Network is trained using stochastic gradient descent algorithm, determination can make the smallest network weight of loss function.
S104: the depth convolutional neural networks output given image obtained using training belongs to each aesthetic feeling classification and each emotion The probability of classification chooses prediction aesthetic feeling class of the classification respectively as given image of maximum probability in aesthetic feeling classification and emotional category Other and emotional category.
In the present embodiment, piece image is given, first by its scaling to 224*224 pixel, then image is inputted and is instructed The network perfected obtains the probability that it belongs to each aesthetic feeling classification and each emotional category, finally chooses the classification conduct of maximum probability The prediction aesthetic feeling classification and emotional category of image.
The present embodiment applies to the thought of multi-task learning in the aesthetic feeling classification and emotional semantic classification of image, makes full use of Associate feature between two tasks, and devise a unified depth convolutional neural networks frame, by connecting across branch Connecing layer makes to make that information can be effectivelyd share between two tasks in a manner of swap image characteristic pattern between network branches, and Automatically which information is study different task need in the training process, realizes and combines to the aesthetic feeling classification and emotional category of image Identification, improves the accuracy of image aesthetic feeling classification and emotional category.
The image aesthetic feeling and emotion joint classification based on depth multi-task learning that 4 pairs of disclosure propose with reference to the accompanying drawing System elaborates.
As shown in figure 4, a kind of image aesthetic feeling and emotion joint classification system based on depth multi-task learning of the present embodiment System, comprising: training dataset forms module 11, depth convolutional neural networks constructing module 12, the training of depth convolutional neural networks Module 13 and prediction categorization module 14.
Wherein:
Training dataset forms module 11, is used to mark the corresponding aesthetic feeling classification of image and emotional category, forms training Data set.
In specific implementation, in image aesthetic feeling classification problem, two class of high aesthetic feeling and low aesthetic feeling is divided the image into;In image In emotional semantic classification problem, pleasure is divided the image into, reveres, meet, excitement, indignation, detest, is frightened, total eight bases of sadness Emotional category.
Since the aesthetic feeling and emotion of people are all the very strong cognition attributes of subjectivity, there are apparent individual differences.Therefore, For the aesthetic feeling classification of image and the mark of emotional category, the strategy that same piece image is marked jointly using more people, it The classification for taking the highest classification of common recognition degree final as image afterwards.
It should be understood that in other examples, the classification of image aesthetic feeling and Image emotional semantic classification can also be divided into other classes Not, those skilled in the art can self-setting as the case may be, be not described in detail here.
Depth convolutional neural networks constructing module 12 is used to construct comprising across branch articulamentum and two parallel networks point The depth convolutional neural networks of branch.
Wherein, two network branches are each responsible for carrying out aesthetic feeling classification and emotional semantic classification to input picture;Across branch connection Layer is for connecting corresponding convolutional layer group in two network branches, to be associated with aesthetic feeling classification and the two tasks of emotional semantic classification;It is deep The output representing input images of degree convolutional neural networks belong to the probability of each aesthetic feeling classification and each emotional category.
Specifically, in the depth convolutional neural networks, convolutional layer group quantity in two network branches is identical to be n;Quantity across branch's articulamentum is n-1;I-th of across branch articulamentum is by i-th of corresponding convolutional layer in two network branches The characteristics of image figure of group output is stacked as input, and by these characteristics of image figures inputted along channel direction, by heap The characteristics of image of poststack is separately input into the corresponding convolutional layer group of i+1 in two network branches;1≤i≤n-1;N be greater than Or the positive integer equal to 2.
Depth convolutional neural networks in the present embodiment are as shown in Fig. 2.Network includes two parallel branch altogether, they connect By same width input picture, and it is each responsible for carrying out aesthetic feeling classification and emotional semantic classification to input picture.The knot of each network branches Structure is identical, is all based on VGG16 network structure (referring to Simonyan K, Zisserman A.Very deep convolutional networks for large-scale image recognition.arXiv preprint arXiv:1409.1556, 2014.).Each network branches are made of 5 convolutional layer groups, 3 full articulamentums and 1 Softmax layers.Wherein, single convolution Comprising multiple continuous convolutional layers and 1 maximum pond layer in layer group, the purpose is to extract effective characteristics of image figure.Full connection Layer carries out multiple nonlinear transformation to the characteristics of image figure of the last one convolutional layer group output, is mapped as a column vector. The dimension of vector is equal to the number of aesthetic feeling classification or emotional category, per the specific aesthetic feeling classification of one-dimensional correspondence one or emotion class Not.By final Softmax layer, which is converted into a probability value per one-dimensional, and representing input images belong to correspondence The probability of classification.Each layer of specific structure and parameter setting are referring to VGG16 network model in network branches.
Across branch articulamentum is introduced, convolutional layer group corresponding in two network branches is attached, across branch articulamentum Structure it is as shown in Fig. 3.The characteristics of image figure that across branch articulamentum exports two convolutional layer groups as input, and by they It is stacked along channel direction.Assuming that the channel number of heap prestack single image characteristic pattern is K (K is positive integer), then stack The channel number of characteristics of image figure is 2K afterwards.Then, the characteristics of image figure of heap poststack is inputted two convolution kernel sizes respectively is The convolutional layer of 1*1.The two convolutional layers all include K convolution kernel, and the step-length and edge filling size of convolution kernel are respectively 1 and 0. In this way, two convolutional layers will export new characteristics of image figure again, and the size of new characteristics of image figure is constant, channel Number reverts to K, and new characteristics of image figure is finally sent into the subsequent convolutional layer group of a network branches or full articulamentum respectively. Intuitively, across branch articulamentum makes to carry out shared information in a manner of swap image characteristic pattern between two network branches, and Facilitating model, automatically study determines which information is two tasks be respectively necessary in the training process;
In traditional depth multi-task learning method, different task is normally provided as sharing lower network layer, and Respective branch is maintained in higher network layer.Before carrying out multi-task learning training, need by virtue of experience artificial in advance Specify shared network layer in ground.This way lacks theoretical direction, for sharing the unreasonable selection side of may result in of network layer The serious downslide of method performance.It is different from the above method, the present embodiment all designs individually for different task on all-network layer Network branches, across branch articulamentum make between network branches can in a manner of swap image characteristic pattern come shared information, And automatically which information is study different task need in the training process, and then improves classification accuracy.
Depth convolutional neural networks training module 13 is used to train depth convolutional Neural net using training dataset Network, until predefined loss function reaches minimum.
The depth convolutional neural networks training module 13, comprising:
Size unified modules 131 are used for the size dimension of all images of unified training dataset;
Initialization module 132 is used to initialize the weight of each layer of depth convolutional neural networks, predefined loss function;
Repetitive exercise module 133 is used to be trained depth convolutional neural networks using stochastic gradient descent algorithm, Determination can make the smallest network weight of loss function;And in each training iteration, cut out from the random position of image One piece of fixed size image block, and flip horizontal is carried out to image block with certain probability.
During training depth convolutional neural networks using training dataset, firstly, by all training image scalings To the size of unified size, the present embodiment is by image zooming to 256*256 pixel;Then, the pixel for calculating training image is average Value, and make every piece image that the mean value be individually subtracted, which can make training image remove common ground, highlight training image Individual difference;Finally, it is big to cut out one piece of fixation from the random position for the image for subtracting mean value in each training iteration Small image block, and flip horizontal is carried out to image block with certain probability.In this way, training sample can effectively be expanded Quantity, the diversity of training for promotion sample.What the present embodiment was chosen is the image block of 224*224 pixel size, is carried out every time The probability of flip horizontal operation is 0.5.
In addition to the full articulamentum of the last layer and across branch articulamentum, the weight of each each layer of network branches is all made of The weight of the VGG16 model of pre-training initializes on ImageNet data set, to the full articulamentum of the last layer and across branch company The weight for connecing layer carries out random initializtion.Using cross entropy loss function, being defined on the classificatory loss of aesthetic feeling is La, in emotion Classificatory loss is Le, i.e.,
La=-ya logpa-(1-ya)log(1-pa)
Wherein, yaThe true aesthstic classification for showing input picture, if it is high artistic image, value 1 that image is practical;It is no Then, value 0.yeShow the real feelings classification of input picture, if image actually belongs to e-th of emotional category, value 1; Otherwise, value 0.paThe image for representing network output belongs to the probability of high aesthetic feeling classification, peThe image for representing network output belongs to The probability of e-th of emotional category.
Further, total loss function is L=La+ λ Le.Wherein, λ is the hyper parameter of two class of balance model loss.? In the present embodiment, consider that aesthetic feeling is classified as two classification problems, and emotional semantic classification is more classification problems, therefore the value that sets of λ is 1/4. Network is trained using stochastic gradient descent algorithm, determination can make the smallest network weight of loss function.
Predict categorization module 14, the depth convolutional neural networks output given image for being used to be obtained using training is belonged to respectively The probability of aesthetic feeling classification and each emotional category chooses the classification of maximum probability in aesthetic feeling classification and emotional category respectively as given The prediction aesthetic feeling classification and emotional category of image.
In the present embodiment, piece image is given, first by its scaling to 224*224 pixel, then image is inputted and is instructed The network perfected obtains the probability that it belongs to each aesthetic feeling classification and each emotional category, finally chooses the classification conduct of maximum probability The prediction aesthetic feeling classification and emotional category of image.
The present embodiment applies to the thought of multi-task learning in the aesthetic feeling classification and emotional semantic classification of image, makes full use of Associate feature between two tasks, and devise a unified depth convolutional neural networks frame, by connecting across branch Connecing layer makes to make that information can be effectivelyd share between two tasks in a manner of swap image characteristic pattern between network branches, and Automatically which information is study different task need in the training process, realizes and combines to the aesthetic feeling classification and emotional category of image Identification, improves the accuracy of image aesthetic feeling classification and emotional category.
In another embodiment, a kind of computer readable storage medium is provided, computer program is stored thereon with, the journey The image aesthetic feeling and emotion joint classification method based on depth multi-task learning as shown in Figure 1 is realized when sequence is executed by processor In step.
In another embodiment, a kind of computer equipment is provided, including memory, processor and storage are on a memory And the computer program that can be run on a processor, the processor are realized as shown in Figure 1 based on depth when executing described program Spend the step in the image aesthetic feeling and emotion joint classification method of multi-task learning.
It should be understood by those skilled in the art that, embodiment of the disclosure can provide as method, system or computer program Product.Therefore, the shape of hardware embodiment, software implementation or embodiment combining software and hardware aspects can be used in the disclosure Formula.Moreover, the disclosure, which can be used, can use storage in the computer that one or more wherein includes computer usable program code The form for the computer program product implemented on medium (including but not limited to magnetic disk storage and optical memory etc.).
The disclosure is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present disclosure Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the program can be stored in a computer-readable storage medium In, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, the storage medium can be magnetic Dish, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random AccessMemory, RAM) etc..
The foregoing is merely preferred embodiment of the present disclosure, are not limited to the disclosure, for the skill of this field For art personnel, the disclosure can have various modifications and variations.It is all within the spirit and principle of the disclosure, it is made any to repair Change, equivalent replacement, improvement etc., should be included within the protection scope of the disclosure.

Claims (10)

1. a kind of image aesthetic feeling and emotion joint classification method based on depth multi-task learning characterized by comprising
The corresponding aesthetic feeling classification of image and emotional category are marked, training dataset is formed;
Construction includes the depth convolutional neural networks of across branch articulamentum and Liang Ge parallel network branch;
Wherein, two network branches are each responsible for carrying out aesthetic feeling classification and emotional semantic classification to input picture;Across branch articulamentum is used In connecting corresponding convolutional layer group in two network branches, to be associated with aesthetic feeling classification and the two tasks of emotional semantic classification;Depth volume The output representing input images of product neural network belong to the probability of each aesthetic feeling classification and each emotional category;
Depth convolutional neural networks are trained using training dataset, until predefined loss function reaches minimum;
The depth convolutional neural networks output given image obtained using training belongs to the general of each aesthetic feeling classification and each emotional category Rate chooses prediction aesthetic feeling classification and emotion of the classification respectively as given image of maximum probability in aesthetic feeling classification and emotional category Classification.
2. the image aesthetic feeling based on depth multi-task learning and emotion joint classification method as described in claim 1, feature It is, in the depth convolutional neural networks, identical convolutional layer group quantity in two network branches is n;Across branch company The quantity for connecing layer is n-1;The figure that i-th of across branch articulamentum exports i-th in two network branches corresponding convolutional layer group It is inputted as characteristic pattern is used as, and these characteristics of image figures inputted is stacked along channel direction, by the image of heap poststack Feature is separately input into the corresponding convolutional layer group of i+1 in two network branches;1≤i≤n-1;N is more than or equal to 2 Positive integer.
3. the image aesthetic feeling based on depth multi-task learning and emotion joint classification method as claimed in claim 2, feature It is, each convolutional layer group includes a maximum pond layer and at least two continuous convolutional layers.
4. the image aesthetic feeling based on depth multi-task learning and emotion joint classification method as described in claim 1, feature It is, the process of depth convolutional neural networks is trained using training dataset, comprising:
The size dimension of all images of unified training dataset;
Initialize the weight of each layer of depth convolutional neural networks, predefined loss function;
Depth convolutional neural networks are trained using stochastic gradient descent algorithm, determination can make the smallest net of loss function Network weight;And in each training iteration, one piece of fixed size image block is cut out from the random position of image, and with certain Probability carries out flip horizontal to image block.
5. a kind of image aesthetic feeling and emotion joint classification system based on depth multi-task learning characterized by comprising
Training dataset forms module, is used to mark the corresponding aesthetic feeling classification of image and emotional category, forms training dataset;
Depth convolutional neural networks constructing module is used to construct the depth comprising across branch articulamentum and Liang Ge parallel network branch Spend convolutional neural networks;
Wherein, two network branches are each responsible for carrying out aesthetic feeling classification and emotional semantic classification to input picture;Across branch articulamentum is used In connecting corresponding convolutional layer group in two network branches, to be associated with aesthetic feeling classification and the two tasks of emotional semantic classification;Depth volume The output representing input images of product neural network belong to the probability of each aesthetic feeling classification and each emotional category;
Depth convolutional neural networks training module is used to train depth convolutional neural networks using training dataset, until Predefined loss function reaches minimum;
Predict categorization module, the depth convolutional neural networks output given image for being used to obtain using training belongs to each aesthetic feeling class Not with the probability of each emotional category, the classification of maximum probability in aesthetic feeling classification and emotional category is chosen respectively as given image Predict aesthetic feeling classification and emotional category.
6. the image aesthetic feeling based on depth multi-task learning and emotion joint classification system as claimed in claim 5, feature It is, in the depth convolutional neural networks, identical convolutional layer group quantity in two network branches is n;Across branch company The quantity for connecing layer is n-1;The figure that i-th of across branch articulamentum exports i-th in two network branches corresponding convolutional layer group It is inputted as characteristic pattern is used as, and these characteristics of image figures inputted is stacked along channel direction, by the image of heap poststack Feature is separately input into the corresponding convolutional layer group of i+1 in two network branches;1≤i≤n-1;N is more than or equal to 2 Positive integer.
7. the image aesthetic feeling based on depth multi-task learning and emotion joint classification system as claimed in claim 6, feature It is, each convolutional layer group includes a maximum pond layer and at least two continuous convolutional layers.
8. the image aesthetic feeling based on depth multi-task learning and emotion joint classification system as claimed in claim 5, feature It is, the depth convolutional neural networks training module, comprising:
Size unified modules are used for the size dimension of all images of unified training dataset;
Initialization module is used to initialize the weight of each layer of depth convolutional neural networks, predefined loss function;
Repetitive exercise module is used to be trained depth convolutional neural networks using stochastic gradient descent algorithm, determines energy So that the smallest network weight of loss function;And in each training iteration, one piece is cut out admittedly from the random position of image Determine sized images block, and flip horizontal is carried out to image block with certain probability.
9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is held by processor Such as the image aesthetic feeling and emotion joint classification of any of claims 1-4 based on depth multi-task learning is realized when row Step in method.
10. a kind of computer equipment including memory, processor and stores the meter that can be run on a memory and on a processor Calculation machine program, which is characterized in that the processor realizes such as base of any of claims 1-4 when executing described program Step in the image aesthetic feeling and emotion joint classification method of depth multi-task learning.
CN201910272826.6A 2019-04-04 2019-04-04 Image aesthetic feeling and emotion joint classification method and system based on depth multi-task learning Pending CN109978074A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910272826.6A CN109978074A (en) 2019-04-04 2019-04-04 Image aesthetic feeling and emotion joint classification method and system based on depth multi-task learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910272826.6A CN109978074A (en) 2019-04-04 2019-04-04 Image aesthetic feeling and emotion joint classification method and system based on depth multi-task learning

Publications (1)

Publication Number Publication Date
CN109978074A true CN109978074A (en) 2019-07-05

Family

ID=67083180

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910272826.6A Pending CN109978074A (en) 2019-04-04 2019-04-04 Image aesthetic feeling and emotion joint classification method and system based on depth multi-task learning

Country Status (1)

Country Link
CN (1) CN109978074A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111401294A (en) * 2020-03-27 2020-07-10 山东财经大学 Multitask face attribute classification method and system based on self-adaptive feature fusion
CN111523574A (en) * 2020-04-13 2020-08-11 云南大学 Image emotion recognition method and system based on multi-mode data
CN112668638A (en) * 2020-12-25 2021-04-16 山东大学 Image aesthetic quality evaluation and semantic recognition combined classification method and system
CN113065571A (en) * 2019-12-16 2021-07-02 北京沃东天骏信息技术有限公司 Method and device for constructing training data set

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106127780A (en) * 2016-06-28 2016-11-16 华南理工大学 A kind of curved surface defect automatic testing method and device thereof
CN106354768A (en) * 2016-08-18 2017-01-25 向莉妮 Matching method for users and commodities and commodity matching recommendation method based on color
CN107103590A (en) * 2017-03-22 2017-08-29 华南理工大学 A kind of image for resisting generation network based on depth convolution reflects minimizing technology
CN107578436A (en) * 2017-08-02 2018-01-12 南京邮电大学 A kind of monocular image depth estimation method based on full convolutional neural networks FCN
CN108427920A (en) * 2018-02-26 2018-08-21 杭州电子科技大学 A kind of land and sea border defense object detection method based on deep learning
CN108898105A (en) * 2018-06-29 2018-11-27 成都大学 It is a kind of based on depth characteristic and it is sparse compression classification face identification method
CN109120992A (en) * 2018-09-13 2019-01-01 北京金山安全软件有限公司 Video generation method and device, electronic equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106127780A (en) * 2016-06-28 2016-11-16 华南理工大学 A kind of curved surface defect automatic testing method and device thereof
CN106354768A (en) * 2016-08-18 2017-01-25 向莉妮 Matching method for users and commodities and commodity matching recommendation method based on color
CN107103590A (en) * 2017-03-22 2017-08-29 华南理工大学 A kind of image for resisting generation network based on depth convolution reflects minimizing technology
CN107578436A (en) * 2017-08-02 2018-01-12 南京邮电大学 A kind of monocular image depth estimation method based on full convolutional neural networks FCN
CN108427920A (en) * 2018-02-26 2018-08-21 杭州电子科技大学 A kind of land and sea border defense object detection method based on deep learning
CN108898105A (en) * 2018-06-29 2018-11-27 成都大学 It is a kind of based on depth characteristic and it is sparse compression classification face identification method
CN109120992A (en) * 2018-09-13 2019-01-01 北京金山安全软件有限公司 Video generation method and device, electronic equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
YUAN GAO等: "NDDR-CNN: Layer-wise Feature Fusing in Multi-Task CNN by Neural Discriminative Dimensionality Reduction", 《ARXIV:1801.08297V1》 *
杨文雅等: "基于语义感知的图像美学质量评估方法", 《计算机应用》 *
汪珊娜: "基于卷积神经网络的织物美感分类与情感标注研究", 《中国优秀硕士学位论文全文数据库 工程科技Ⅰ辑》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113065571A (en) * 2019-12-16 2021-07-02 北京沃东天骏信息技术有限公司 Method and device for constructing training data set
CN111401294A (en) * 2020-03-27 2020-07-10 山东财经大学 Multitask face attribute classification method and system based on self-adaptive feature fusion
CN111523574A (en) * 2020-04-13 2020-08-11 云南大学 Image emotion recognition method and system based on multi-mode data
CN112668638A (en) * 2020-12-25 2021-04-16 山东大学 Image aesthetic quality evaluation and semantic recognition combined classification method and system

Similar Documents

Publication Publication Date Title
CN106779087B (en) A kind of general-purpose machinery learning data analysis platform
CN109978074A (en) Image aesthetic feeling and emotion joint classification method and system based on depth multi-task learning
CN106778682B (en) A kind of training method and its equipment of convolutional neural networks model
CN107610123A (en) A kind of image aesthetic quality evaluation method based on depth convolutional neural networks
CN109816009A (en) Multi-tag image classification method, device and equipment based on picture scroll product
CN109359538A (en) Training method, gesture identification method, device and the equipment of convolutional neural networks
CN104933428B (en) A kind of face identification method and device based on tensor description
CN107341506A (en) A kind of Image emotional semantic classification method based on the expression of many-sided deep learning
CN108961245A (en) Picture quality classification method based on binary channels depth parallel-convolution network
CN107742107A (en) Facial image sorting technique, device and server
CN108875934A (en) A kind of training method of neural network, device, system and storage medium
CN109871830A (en) Spatial-spectral fusion hyperspectral image classification method based on three-dimensional depth residual error network
CN110472681A (en) The neural metwork training scheme and image procossing scheme of knowledge based distillation
CN110532996A (en) The method of visual classification, the method for information processing and server
CN105956150B (en) A kind of method and device generating user's hair style and dressing collocation suggestion
CN107480642A (en) A kind of video actions recognition methods based on Time Domain Piecewise network
CN105512676A (en) Food recognition method at intelligent terminal
CN109145871A (en) Psychology and behavior recognition methods, device and storage medium
CN105469376A (en) Method and device for determining picture similarity
CN109766465A (en) A kind of picture and text fusion book recommendation method based on machine learning
CN108596243A (en) The eye movement for watching figure and condition random field attentively based on classification watches figure prediction technique attentively
CN109376683A (en) A kind of video classification methods and system based on dense graph
CN110689523A (en) Personalized image information evaluation method based on meta-learning and information data processing terminal
CN110263822A (en) A kind of Image emotional semantic analysis method based on multi-task learning mode
CN109492596A (en) A kind of pedestrian detection method and system based on K-means cluster and region recommendation network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination