CN107862322A

CN107862322A - The method, apparatus and system of picture attribute classification are carried out with reference to picture and text

Info

Publication number: CN107862322A
Application number: CN201710832627.7A
Authority: CN
Inventors: 张智祺; 黄惠燕; 崔燕红; 徐然; 郭安琪
Original assignee: Guangzhou Vipcom Research Institute Co Ltd
Current assignee: Guangzhou Pinwei Software Co Ltd
Priority date: 2017-09-15
Filing date: 2017-09-15
Publication date: 2018-03-30
Anticipated expiration: 2037-09-15
Also published as: CN107862322B

Abstract

The invention discloses the method, apparatus and system that a kind of combination picture and text carry out picture attribute classification, belong to field of computer technology.Methods described includes：The characteristics of image of the picture and the text feature of the picture are identified by presetting neural network model, and forms union feature；Classification processing is carried out to the union feature, exports picture attribute classification results；The default neural network model comprises at least predetermined depth convolutional neural networks model and Recognition with Recurrent Neural Network model.The present invention is by combining the characteristics of image of picture and the text feature of picture, both can carry out complementation, more fully picture feature data are provided, make it possible to the attribute of preferably expression picture, article or other related objects, obtain more detailed, accurate object properties classification results, therefore this method can be used for picture attribute extraction, improves knowledge mapping or the business such as be inquired about, searched for according to picture attribute classification.

Description

The method, apparatus and system of picture attribute classification are carried out with reference to picture and text

Technical field

The present invention relates to field of computer technology, more particularly to a kind of combination picture and text carry out picture attribute classification Method and device.

Background technology

At present, whole world internet has formed scale, and the Internet, applications move towards diversification, and internet is more and more profoundly Change the study, work and life style of people.In network data analysis, the habit of Internet user can be accurately known The attributes such as used, demand are that precise contents are promoted with better services in client or the important prerequisite of advertisement putting.At present, interconnecting Identify that the prior art of media subscriber attribute is all based on user's article or picture sample in net, especially picture sample exists The customer attribute information contained in some field pictures has very big potential use, specifically needs first to collect user's full dose and goes through History sample, the data of sample of users are arranged, arrange Sample Storehouse, the classification of row label corpus is entered to Sample Storehouse, such as, some language material Storehouse represents the contents such as " shopping ", " fashion ", " dress ornament ", then further according to Sample Storehouse and the progress of the Sample Storehouse of Internet user Match somebody with somebody, to identify user property.That is, identify that the conventional method of user property is based on sample data, passes through engineering in internet Practise, then be equipped with data model and be trained, carry out the judgement of internet customer attribute.Wherein, entered according to the sample data of collection Row attributive classification is the important step of said process.In order to meet the growing market demand, how to realize to scheming in network Piece attribute carry out in more detail, more fully attributive classification, the problem of being current urgent need to resolve.

The content of the invention

In order to solve problem of the prior art, combine picture the embodiments of the invention provide one kind and text carries out picture category Property classification method, apparatus and system.The technical scheme is as follows：

First aspect, there is provided the method that a kind of combination picture and text carry out picture attribute classification, methods described include：

The characteristics of image of the picture and the text feature of the picture are identified by presetting neural network model, and is formed Union feature；

Classification processing is carried out to the union feature, exports picture attribute classification results；

The default neural network model comprises at least predetermined depth convolutional neural networks model and Recognition with Recurrent Neural Network mould Type.

With reference in a first aspect, in second of possible implementation, described pass through presets neural network model identification institute The characteristics of image of picture and the text feature of the picture are stated, and forms union feature, including：

Image expression is carried out by default neural network model, obtains image expression result；

Text representation is carried out by default neural network model, obtains text representation result；

Association list is carried out according to described image expression of results and the text representation result by default neural network model Reach, form union feature.

It is described to pass through in the third possible implementation with reference to second of possible implementation of first aspect Default neural network model carries out image expression, obtains image expression result, including：

Global image expression is carried out by the predetermined depth convolutional neural networks model, obtains image expression result.

It is described to pass through in the 4th kind of possible implementation with reference to second of possible implementation of first aspect Default neural network model carries out text representation, obtains text representation result, including：

Term vector expression is carried out by preset loop neural network model, obtains term vector expression of results；

Global text representation is carried out by preset loop neural network model according to the term vector expression of results, obtains text This expression of results.

With reference to the 4th kind of possible implementation of first aspect, in the 5th kind of possible implementation, described logical Cross before the progress term vector expression of preset loop neural network model, in addition to step：

Chinese word segmentation is carried out to the text of the picture, obtains Chinese word.

It is described to pass through in the 6th kind of possible implementation with reference to second of possible implementation of first aspect Default neural network model carries out Combined expression according to described image expression of results and the text representation result, and it is special to form joint Sign, including：

Connection is weighted to described image expression of results and the text representation result, forms union feature.

With reference in a first aspect, in the 7th kind of possible implementation, described pass through presets neural network model identification institute The characteristics of image of picture and the text feature of the picture are stated, and forms union feature, including：

Combined expression is carried out to the image and text of the picture by presetting neural network model, and it is special to form joint Sign.

It is described that classification processing is carried out to the union feature with reference in a first aspect, in the 8th kind of possible implementation, Picture attribute classification results are exported, including：

Softmax classification, output picture attribute classification knot are carried out to the union feature by default neural network model Fruit.

Second aspect, there is provided a kind of combination picture and text carry out the device of picture attribute classification, and described device includes：

Computing module is identified, for identifying the characteristics of image of the picture and the picture by presetting neural network model Text feature, and form union feature；It is additionally operable to carry out classification processing to the union feature；The default neutral net mould Type comprises at least predetermined depth convolutional neural networks model and Recognition with Recurrent Neural Network model.

Output module, for exporting picture attribute classification results.

The third aspect, there is provided a kind of combination picture and text carry out the device of picture attribute classification, and described device includes： Memory and the processor being connected with the memory,

Memory is used to store batch processing code, the program code that processor calls memory to be stored be used to performing with Lower operation：

Fourth aspect, there is provided the system that a kind of combination picture and text carry out picture attribute classification, the system include：

Computing device is identified, for identifying the characteristics of image of the picture and the picture by presetting neural network model Text feature, and form union feature；It is additionally operable to carry out classification processing to the union feature；The default neutral net mould Type comprises at least predetermined depth convolutional neural networks model and Recognition with Recurrent Neural Network model

Output device, for exporting picture attribute classification results.

The beneficial effect that technical scheme provided in an embodiment of the present invention is brought is：

Combination picture and text provided in an embodiment of the present invention carry out the method, apparatus and system of picture attribute classification, lead to Cross and realize following steps：The characteristics of image of the picture and the text spy of the picture are identified by presetting neural network model Sign, and form union feature；Classification processing is carried out to the union feature, exports picture attribute classification results, is schemed by combining The characteristics of image of piece and the text feature of picture, both can carry out complementation, there is provided more fully picture feature data so that energy The attribute of enough preferably expression pictures, article or other related objects, obtains more detailed, accurate object properties classification results, Therefore this method can be used for picture attribute extraction, improve knowledge mapping or be inquired about according to picture attribute classification, the industry such as search for Business.

Brief description of the drawings

Technical scheme in order to illustrate the embodiments of the present invention more clearly, make required in being described below to embodiment Accompanying drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the present invention, for For those of ordinary skill in the art, on the premise of not paying creative work, other can also be obtained according to these accompanying drawings Accompanying drawing.

Fig. 1 is the method flow diagram that the combination picture that inventive embodiments 1 provide and text carry out picture attribute classification；

Fig. 2 is the method flow diagram that the combination picture that inventive embodiments 2 provide and text carry out picture attribute classification；

Fig. 3 is the default neural network model schematic diagram based on picture and text that inventive embodiments 2 provide；

Fig. 4 is the VGG model schematics that inventive embodiments 2 provide；

Fig. 5 is the method flow diagram that the combination picture that inventive embodiments 3 provide and text carry out picture attribute classification；

Fig. 6 is the default neural network model schematic diagram based on picture and text that inventive embodiments 3 provide；

Fig. 7 is the apparatus structure schematic diagram that the combination picture that inventive embodiments 4 provide and text carry out picture attribute classification；

Fig. 8 is the system structure diagram that the combination picture that inventive embodiments 5 provide and text carry out picture attribute classification；

Fig. 9 is the structural representation of device 6 that the combination picture that inventive embodiments 6 provide and text carry out picture attribute classification Figure.

Embodiment

To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached in the embodiment of the present invention Figure, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is only this Invention part of the embodiment, rather than whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art exist The every other embodiment obtained under the premise of creative work is not made, belongs to the scope of protection of the invention.

The embodiment of the present invention by provide a kind of combination picture and text carry out picture attribute classification method, apparatus and System, by combining the characteristics of image of picture and the text feature of picture, both can carry out complementation, there is provided more fully picture Characteristic, enabling preferably express the attribute of picture, article or other related objects, it is more detailed, accurately right to obtain As attributive classification result, thus this method can be used for picture attribute extraction, improve knowledge mapping or according to picture attribute classify into The business such as row inquiry, search.

Picture is carried out to combination picture provided in an embodiment of the present invention and text with reference to specific embodiment and accompanying drawing The method, apparatus and system of attributive classification are described further.

Embodiment 1

Fig. 1 is the method flow diagram that the combination picture that inventive embodiments 1 provide and text carry out picture attribute classification, is such as schemed Shown in 1, the method that combination picture and text provided in an embodiment of the present invention carry out picture attribute classification comprises the following steps：

101st, the characteristics of image of picture and the text feature of picture are identified by presetting neural network model, and forms joint Feature.

Specifically, default neural network model here comprises at least predetermined depth convolutional neural networks model and circulation god Through network model.

Specifically, identifying the characteristics of image of picture and the text feature of picture by presetting neural network model, and formed Union feature, including：

Image expression is carried out by default neural network model, obtains image expression result.

Combined expression is carried out according to image expression result and text representation result by default neural network model, forms connection Close feature.

Because characteristics of image and text feature have the characteristics of different, said process is distinguished by default neural network model Image expression and text representation are carried out, individually obtains image expression result and text representation result, then passes through default nerve again Both are carried out Combined expression by network, ultimately form union feature, and such processing procedure allows to carrying out feature representation When, adaptable expression way or expression process are selected, final comprehensive obtained Combined expression result is also more accurate, and efficiency is more It is high.

Specifically, step carries out image expression by default neural network model, obtains image expression result, including：

Global image expression is carried out by predetermined depth convolutional neural networks model, obtains image expression result.

Specifically, step carries out text representation by default neural network model, obtains text representation result, including：

Global text representation is carried out by preset loop neural network model according to term vector expression of results, obtains text table Up to result.

Specifically, before term vector expression is carried out by preset loop neural network model, in addition to step：

Chinese word segmentation is carried out to the text of picture, obtains Chinese word.

Specifically, step is combined by default neural network model according to image expression result and text representation result Expression, union feature is formed, including：

Connection is weighted to image expression result and text representation result, forms union feature.

102nd, classification processing is carried out to union feature, exports picture attribute classification results.

The method that the embodiment of the present invention carries out picture attribute classification by providing a kind of combination picture and text, by pre- If the characteristics of image of neural network model identification picture and the text feature of picture simultaneously form union feature and to union feature Classification processing, output picture attribute classification results are carried out, are extracted with reference to the characteristics of image of picture and the text feature of picture Classification, because both can carry out complementation, using the teaching of the invention it is possible to provide more fully picture feature data, enabling preferably expression figure The attribute of piece, article or other related objects, more detailed, accurate object properties classification results are obtained, therefore this method can use In picture attribute extraction, improve knowledge mapping or inquired about according to picture attribute classification, the multinomial business such as search for.

Embodiment 2

Fig. 2 be inventive embodiments 2 provide combination picture and text carry out picture attribute classification method flow diagram, Fig. 3 It is the default neural network model schematic diagram based on picture and text that inventive embodiments 2 provide, Fig. 4 is that inventive embodiments 2 carry The VGG model schematics of confession, as shown in Figures 2 and 3, combination picture and text provided in an embodiment of the present invention carry out picture attribute The method of classification comprises the following steps：

201st, image expression is carried out by default neural network model, obtains image expression result.

Specifically, by presetting neural network model, to all elements on picture, (such as element here can be to scheme Each pattern block of piece is unit) or Partial Elements carry out image expression, obtain the image expression result of each element, each The corresponding attribute tags of expression of results, for expressing the image information of picture.Further, according to default neural network model To the difference of picture all elements or Partial Elements, specific expression process can be divided into following two situations：

First, by a default neural network model or multiple default neutral nets used by all elements traversal of picture Model, finally obtain the image expression result of each element；

2nd, when needing to carry out the Partial Elements expression of picture according to end article, determined according to predefined rule above-mentioned Partial Elements, then by a default neural network model or multiple default nerve nets used by these elements traversal of picture Network model, finally obtain the image expression result of each element.

Exemplarily, global image expression is carried out by predetermined depth convolutional neural networks model, obtains image expression knot Fruit.For example carry out image expression using 16 layers of VGG models.The predetermined depth convolutional neural networks model utilizes multilayer nerve net Learn simple shape, color, texture etc. simply from simple to the series of features of complexity, such as lower level in network picture engraving Pattern, constantly combination form the gradually complicated pattern with semantic information, such as face feature, collar feature etc..Such as Fig. 4 Shown, the conventional part of VGG models is by five groups【3*3*N convolutional layers+2*2max-pooling+Relu】Block composition；Afterwards Connect two layers of fully-connected network (fc6, fc7) and obtain the feature of 4096 dimensions；One layer of fully-connected network (fc8) of progress obtains more afterwards The logits of classification；The probability that softmax classifies to obtain all categories is finally carried out to logits.

It is worth noting that, step 201 carries out image expression by default neural network model, obtains image expression knot Fruit, in addition to the mode described in above-mentioned steps, the process can also be realized by other means, the embodiment of the present invention is to specific Mode be not limited.

202nd, text representation is carried out by default neural network model, obtains text representation result.

Specifically, carrying out Chinese word segmentation to the text of picture, Chinese word is obtained；Pass through preset loop neural network model Term vector expression is carried out, obtains term vector expression of results；Preset loop neural network model is passed through according to term vector expression of results Global text representation is carried out, obtains text representation result.

As shown in figure 3, text source is name of product corresponding with image, product introduction etc..Chinese word segmentation is carried out first, Obtain a series of Chinese words；Part II is to obtain the expression of Chinese word, utilizes the word of the continuous word vectors trained Allusion quotation obtains the word lists compared with low dimensional up to (it can be based on Recognition with Recurrent Neural Network RNN to obtain term vector method, or is based on Continuous BoW/Skip-gram method)；Part III is to obtain the expression of whole sentence or paragraph, is utilized RNN or LSTM are modeled to sequence vector, table of the hidden state vector that last term vector exports as whole paragraph Reach.

It is worth noting that, step 202 carries out text representation by default neural network model, obtains text representation knot Fruit, in addition to the mode described in above-mentioned steps, the process can also be realized by other means, the embodiment of the present invention is to specific Mode be not limited.

203rd, Combined expression, shape are carried out according to image expression result and text representation result by default neural network model Into union feature.

Specifically, being weighted connection to image expression result and text representation result, union feature is formed.

It is worth noting that, step 203 is by presetting neural network model according to image expression result and text representation knot Fruit carries out Combined expression, forms union feature, in addition to the mode described in above-mentioned steps, can also realize by other means The process, the embodiment of the present invention are not limited to specific mode.

204th, softmax classification, output picture attribute classification knot are carried out to union feature by default neural network model Fruit.

The union feature after image expression and text representation is obtained, both are weighted with connection, obtains Combined expression. One or more layers fully-connected network is carried out to Combined expression and obtains the logits of N classes, softmax classification is carried out to logits.Profit Back-propagation is carried out to classification loss with stochastic gradient descent algorithm, loss is respectively along image branch and text point The downward anti-pass of branch.According to the size of database, the depth of anti-pass is controlled.Such as less training set, in order to prevent plan Close, the fc6 layers of an anti-pass to VGG models and the Recognition with Recurrent Neural Network layer of text model；Can be with anti-pass to figure for large data sets The convolutional layer of picture and the term vector layer of text.

The method that the embodiment of the present invention carries out picture attribute classification by providing a kind of combination picture and text, by pre- If neural network model carries out image expression, image expression result is obtained；Text representation is carried out by default neural network model, Obtain text representation result；Association list is carried out according to image expression result and text representation result by default neural network model Reach, form union feature；Softmax classification, output picture attribute point are carried out to union feature by default neural network model Class result, extraction classification is carried out with reference to the characteristics of image of picture and the text feature of picture, because both can carry out complementation, energy It is enough that more fully picture feature data are provided, enabling preferably to express the attribute of picture, article or other related objects, obtain In more detail, accurate object properties classification results, therefore this method can be used for picture attribute extraction, improve knowledge mapping or root Inquired about according to picture attribute classification, the multinomial business such as search for.

Embodiment 3

Fig. 5 is the method flow diagram that the combination picture that inventive embodiments 3 provide and text carry out picture attribute classification；Fig. 6 It is the default neural network model schematic diagram based on picture and text that inventive embodiments 3 provide, as it can be seen in figures 5 and 6, of the invention The method that the combination picture and text that embodiment provides carry out picture attribute classification comprises the following steps：

301st, Combined expression is carried out to the image and text of picture by presetting neural network model, and it is special to form joint Sign.

Specifically, being different from embodiment 1 and embodiment 2, the image and text of the step combination picture are together by default Neural network model carries out Combined expression.As shown in fig. 6, before Combined expression is carried out, predetermined depth convolution can be first passed through Network model carries out preliminary image expression to picture, subsequently into embeding layer, afterwards with by picture attribute word participle, word to Text representation result after amount expression carries out Combined expression by preset loop neural network model together, and it is special to form joint Sign.

It is worth noting that, step 301 carries out association list by presetting neural network model to the image and text of picture Reach, and form union feature, in addition to the mode described in above-mentioned steps, the process can also be realized by other means, this Inventive embodiments are not limited to specific mode.

302nd, softmax classification, output picture attribute classification knot are carried out to union feature by default neural network model Fruit.

It is worth noting that, step 30 carries out softmax classification, output to union feature by default neural network model Picture attribute classification results, in addition to the mode described in above-mentioned steps, the process, this hair can also be realized by other means Bright embodiment is not limited to specific mode.

The method that the embodiment of the present invention carries out picture attribute classification by providing a kind of combination picture and text, by pre- If neural network model carries out Combined expression to the image and text of picture, and forms union feature；By presetting neutral net Model carries out softmax classification to union feature, exports picture attribute classification results, with reference to the characteristics of image and picture of picture Text feature carries out extraction classification, because both can carry out complementation, using the teaching of the invention it is possible to provide more fully picture feature data so that energy The attribute of enough preferably expression pictures, article or other related objects, obtains more detailed, accurate object properties classification results, Combined expression is carried out together additionally, due to the characteristics of image and text feature for combining picture, simplifies step, this method can be used for Picture attribute extraction, improve knowledge mapping or inquired about according to picture attribute classification, the multinomial business such as search for.

Embodiment 4

Fig. 7 is the structural representation of device 4 that the combination picture that inventive embodiments 4 provide and text carry out picture attribute classification Figure, as shown in fig. 7, the device that combination picture provided in an embodiment of the present invention and text carry out picture attribute classification includes：

Computing module 41 is identified, for identifying the characteristics of image of picture and the text of picture by presetting neural network model Feature, and form union feature；It is additionally operable to carry out classification processing to union feature；

Output module 42, for exporting picture attribute classification results.

Specifically, identification computing module 41 performs the characteristics of image and picture that picture is identified by presetting neural network model Text feature and form the process of union feature, including：

In addition, identification computing module 41 is additionally operable to carry out union feature classification processing, classification results are obtained.

The embodiment of the present invention carries out the device of picture attribute classification by providing a kind of combination picture and text, utilizes it Including identification computing module and output module pass through default neural network model and identify the characteristics of image of picture and the text of picture Eigen simultaneously forms union feature and classification processing, output picture attribute classification results is carried out to union feature, with reference to picture Characteristics of image and the text feature of picture carry out extraction classification, because both can carry out complementation, using the teaching of the invention it is possible to provide more fully Picture feature data, enabling preferably express the attribute of picture, article or other related objects, obtain in more detail, accurately Object properties classification results, therefore this method can be used for picture attribute extraction, improve knowledge mapping or according to picture attribute point Class such as is inquired about, searched at the multinomial business.

Embodiment 5

Fig. 8 is the system structure diagram that the combination picture that inventive embodiments 5 provide and text carry out picture attribute classification, As shown in figure 8, the system that combination picture provided in an embodiment of the present invention and text carry out picture attribute classification includes：

Computing device 51 is identified, for identifying the characteristics of image of picture and the text of picture by presetting neural network model Feature, and form union feature；It is additionally operable to carry out classification processing to union feature；

Output device 52, for exporting picture attribute classification results.

Specifically, identification computing device 51 performs the characteristics of image and picture that picture is identified by presetting neural network model Text feature, and the process for forming union feature can be：

Image expression is carried out by default neural network model, obtains image expression result.Preferably pass through predetermined depth Convolutional neural networks model carries out global image expression, obtains image expression result.Using depth convolutional neural networks (DCNN), For example 16 layers of VGG models carry out image expression.

Text representation is carried out by default neural network model, obtains text representation result.To in the text progress of picture Text participle, obtains Chinese word；Term vector expression is carried out by preset loop neural network model, obtains term vector expression knot Fruit；Global text representation is carried out by preset loop neural network model according to term vector expression of results, obtains text representation knot Fruit.

Combined expression is carried out according to image expression result and text representation result by default neural network model, forms connection Close feature.Specifically, being weighted connection to image expression result and text representation result, union feature is formed.

Softmax classification is carried out to union feature by default neural network model, obtains classification results.

The system that the embodiment of the present invention carries out picture attribute classification by providing a kind of combination picture and text, utilizes it Including identification computing device and output device pass through default neural network model and carry out image expression, obtain image expression knot Fruit；Text representation is carried out by default neural network model, obtains text representation result；By default neural network model according to Image expression result and text representation result carry out Combined expression, form union feature；By presetting neural network model distich Close feature and carry out softmax classification, picture attribute classification results are exported, with reference to the characteristics of image of picture and the text feature of picture Extraction classification is carried out, because both can carry out complementation, using the teaching of the invention it is possible to provide more fully picture feature data, enabling preferably The attribute of picture, article or other related objects is expressed, obtains more detailed, accurate object properties classification results, therefore the party Method can be used for picture attribute extraction, improve knowledge mapping or be inquired about according to picture attribute classification, the multinomial business such as search for.

Embodiment 6

Fig. 9 is the structural representation of device 6 that the combination picture that inventive embodiments 6 provide and text carry out picture attribute classification Figure, as shown in figure 9, the device that combination picture provided in an embodiment of the present invention and text carry out picture attribute classification includes：Storage Device 61 and the processor 62 being connected with memory, memory 61 are used to store batch processing code, and processor 62 calls storage The program code that device 61 is stored is used to perform following operation：

The characteristics of image of picture and the text feature of picture are identified by presetting neural network model, and it is special to form joint Sign, specifically, including：Combined expression is carried out to the image and text of picture by presetting neural network model, and forms joint Feature.

Classification processing is carried out to union feature, exports picture attribute classification results, specifically, including：Pass through default nerve Network model carries out softmax classification to union feature, exports picture attribute classification results.

The embodiment of the present invention carries out the device of picture attribute classification by providing a kind of combination picture and text, by pre- If neural network model carries out Combined expression to the image and text of picture, and forms union feature；By presetting neutral net Model carries out softmax classification to union feature, exports picture attribute classification results, with reference to the characteristics of image and picture of picture Text feature carries out extraction classification, because both can carry out complementation, using the teaching of the invention it is possible to provide more fully picture feature data so that energy The attribute of enough preferably expression pictures, article or other related objects, obtains more detailed, accurate object properties classification results, Combined expression is carried out together additionally, due to the characteristics of image and text feature for combining picture, simplifies step, this method can be used for Picture attribute extraction, improve knowledge mapping or inquired about according to picture attribute classification, the multinomial business such as search for.

Above-mentioned all optional technical schemes, any combination can be used to form the alternative embodiment of the present invention, herein no longer Repeat one by one.

In summary, combination picture and text provided in an embodiment of the present invention carry out the method, apparatus of picture attribute classification And system, by realizing following steps：The characteristics of image of picture and the text spy of picture are identified by presetting neural network model Sign, and form union feature；Classification processing is carried out to union feature, exports picture attribute classification results, by combining picture The text feature of characteristics of image and picture, both can carry out complementation, there is provided more fully picture feature data, enabling more The good attribute for expressing picture, article or other related objects, obtains more detailed, accurate object properties classification results, therefore This method can be used for picture attribute extraction, improve knowledge mapping or be inquired about according to picture attribute classification, the business such as search for.

It should be noted that：The combination picture and text that above-described embodiment provides carry out the device of picture attribute classification, are System is only carried out for example, real when combining picture and text carries out picture attribute classification with the division of above-mentioned each functional module In the application of border, it can be completed as needed and by above-mentioned function distribution by different functional modules, i.e., by device or system Portion's structure is divided into different functional modules, to complete all or part of function described above.In addition, above-described embodiment carries The combination picture and text of confession carry out the device of picture attribute classification, system and carry out picture attribute classification with combining picture and text Embodiment of the method belong to same design, its specific implementation process refers to embodiment of the method, repeats no more here.

One of ordinary skill in the art will appreciate that hardware can be passed through by realizing all or part of step of above-described embodiment To complete, by program the hardware of correlation can also be instructed to complete, described program can be stored in a kind of computer-readable In storage medium, storage medium mentioned above can be read-only storage, disk or CD etc..

The foregoing is only presently preferred embodiments of the present invention, be not intended to limit the invention, it is all the present invention spirit and Within principle, any modification, equivalent substitution and improvements made etc., it should be included in the scope of the protection.

Claims

1. a kind of method that combination picture and text carry out picture attribute classification, it is characterised in that methods described includes：

The characteristics of image of the picture and the text feature of the picture are identified by presetting neural network model, and forms joint Feature；

The default neural network model comprises at least predetermined depth convolutional neural networks model and Recognition with Recurrent Neural Network model.

2. according to the method for claim 1, it is characterised in that described that the picture is identified by default neural network model Characteristics of image and the picture text feature, and form union feature, including：

Combined expression, shape are carried out according to described image expression of results and the text representation result by default neural network model Into union feature.

3. according to the method for claim 2, it is characterised in that described that image table is carried out by default neural network model Reach, obtain image expression result, including：

4. according to the method for claim 2, it is characterised in that described that text table is carried out by default neural network model Reach, obtain text representation result, including：

Global text representation is carried out by preset loop neural network model according to the term vector expression of results, obtains text table Up to result.

5. according to the method for claim 4, it is characterised in that word is carried out by preset loop neural network model described Before vector table reaches, in addition to step：

6. according to the method for claim 2, it is characterised in that described by presetting neural network model according to described image Expression of results and the text representation result carry out Combined expression, form union feature, including：

7. according to the method for claim 1, it is characterised in that described that the picture is identified by default neural network model Characteristics of image and the picture text feature, and form union feature, including：

Combined expression is carried out to the image and text of the picture by presetting neural network model, and forms union feature.

8. according to the method for claim 1, it is characterised in that described that classification processing, output are carried out to the union feature Picture attribute classification results, including：

Softmax classification is carried out to the union feature by default neural network model, exports picture attribute classification results.

9. a kind of combination picture and text carry out the device of picture attribute classification, it is characterised in that described device includes：

Computing module is identified, for identifying the characteristics of image of the picture and the text of the picture by presetting neural network model Eigen, and form union feature；It is additionally operable to carry out classification processing to the union feature；

Output module, for exporting picture attribute classification results.

10. the system that a kind of combination picture and text carry out picture attribute classification, it is characterised in that the system includes：

Computing device is identified, for identifying the characteristics of image of the picture and the text of the picture by presetting neural network model Eigen, and form union feature；It is additionally operable to carry out classification processing to the union feature；

Output device, for exporting picture attribute classification results.