CN108021897A - Picture answering method and device - Google Patents
Picture answering method and device Download PDFInfo
- Publication number
- CN108021897A CN108021897A CN201711308733.1A CN201711308733A CN108021897A CN 108021897 A CN108021897 A CN 108021897A CN 201711308733 A CN201711308733 A CN 201711308733A CN 108021897 A CN108021897 A CN 108021897A
- Authority
- CN
- China
- Prior art keywords
- picture
- rnn
- feature vector
- neutral net
- answer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/35—Categorising the entire scene, e.g. birthday party or wedding scene
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/29—Graphical models, e.g. Bayesian networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The disclosure is directed to a kind of picture answering method and device, including receive picture and it is related to the picture the problem of;The image feature vector of the picture is extracted by CNN;The coding characteristic vector of described problem is extracted by the first RNN;Described image feature vector and the coding characteristic vector are spliced, obtain union feature vector;The union feature vector is decoded by decoding neutral net, generates the answer of described problem;Wherein, the CNN, the first RNN and decoding neutral net are the neutral nets trained in a manner of end to end in the case of three as a whole network.The answer to picture the problem of related can be generated according to the picture answering method and device of the embodiment of the present disclosure, and CNN, the first RNN and decoding neutral net of each step use are in the case where three is as an entirety, carry out what is obtained under training end to end, cumulative errors are eliminated, improve the accuracy rate of question and answer.
Description
Technical field
This disclosure relates to technical field of information processing, more particularly to picture answering method and device.
Background technology
With the development of the information processing technology, intelligent answer technology receives more and more attention, and intelligent Answer System can
To answer the problem of user proposes automatically in the form of question-response.It is exactly to the question and answer for example, for the question answering system of picture
System provides a picture and the problem of for the picture, and intelligent Answer System can automatically answer the problem.So can be with
Visually impaired person is helped to understand the world.In correlation technique, the accuracy rate for the answer that intelligent Answer System provides is relatively low.
The content of the invention
To overcome problem present in correlation technique, the disclosure provides a kind of picture answering method and device, can be directed to
The problem of picture, there is provided accurate answer.
According to the first aspect of the embodiment of the present disclosure, there is provided a kind of picture answering method, including:Receive picture and with it is described
The problem of picture is related;The image feature vector of the picture is extracted by CNN;RNN, which is followed, by first extracts described problem
Coding characteristic vector;Described image feature vector and the coding characteristic vector are spliced, obtain union feature vector;It is logical
Cross decoding neutral net to decode the union feature vector, generate the answer of described problem;Wherein, the CNN, first
RNN and decoding neutral net are the nerve nets trained in a manner of end to end in the case of three as a whole network
Network.
In a kind of possible implementation, the image feature vector of the picture is extracted by CNN to be included:It will receive
Picture zoom to fixed dimension, obtain standard picture;The standard picture is inputted in CNN;By the output knot of the CNN
Fruit, is determined as the image feature vector of the picture.
In a kind of possible implementation, the coding characteristic vector of described problem is extracted by the first RNN to be included:By institute
The problem of stating is segmented, and obtains the corresponding word insertion sequence of described problem;By the first RNN described in institute's predicate insertion sequence inputting
In;By the output of the first RNN as a result, being determined as the coding characteristic vector of described problem.
In a kind of possible implementation, the decoding neutral net is multilayer neural network MLP, by decoding nerve
Network decodes the union feature vector, generates the answer of described problem and includes:Union feature vector is inputted
MLP network, obtains feature set;Classified using grader to the feature set, obtain the classification of the feature set;By described in
The classification of feature set, is determined as the answer of described problem.
In a kind of possible implementation, the decoding neutral net is RNN networks, by decoding neutral net to institute
State union feature vector to be decoded, generating the answer of described problem includes:By the 2nd RNN of union feature vector input
In;The output result of 2nd RNN is determined as to the answer of described problem.
According to the second aspect of the embodiment of the present disclosure, there is provided a kind of picture question and answer system, including:Receiving module, for connecing
Receive picture and it is related to the picture the problem of;First extraction module, for extracting the picture by convolutional neural networks CNN
Image feature vector;Second extraction module, for extracting the coding characteristic of described problem by first circulation neutral net RNN
Vector;Concatenation module, for described image feature vector and the coding characteristic vector to be spliced, obtain union feature to
Amount;Generation module, for being decoded by decoding neutral net to the union feature vector, generates answering for described problem
Case;Wherein, the CNN, the first RNN and decoding neutral net are in the case of three as a whole network, with end to end
The neutral net that mode is trained.
In a kind of possible implementation, the first extraction module includes:Submodule is scaled, for the picture that will be received
Fixed dimension is zoomed to, obtains standard picture;Image input submodule, for the standard picture to be inputted in CNN;First is true
Stator modules, for exporting the CNN as a result, being determined as the image feature vector of the picture.
In a kind of possible implementation, the second extraction module includes:Submodule is segmented, for described problem to be carried out
Participle, obtains the corresponding word insertion sequence of described problem;Sequence inputting submodule, for by described in institute's predicate insertion sequence inputting
In first RNN;Second determination sub-module, for exporting the first RNN as a result, being determined as the coding characteristic of described problem
Vector.
In a kind of possible implementation, the decoding neutral net is MLP, and generation module includes:Primary vector is defeated
Enter submodule, for union feature vector to be inputted MLP network, obtain feature set;Classification submodule, for using classification
Device classifies the feature set, obtains the classification of the feature set;3rd determination sub-module, for by the feature set
Classification, is determined as the answer of described problem.
In a kind of possible implementation, the decoding neutral net is RNN networks, and generation module includes:Second to
Input submodule is measured, for by the 2nd RNN of union feature vector input;4th determination sub-module, for by described
The output result of two RNN is determined as the answer of described problem.
The technical scheme provided by this disclosed embodiment can include the following benefits:, will be logical in the embodiment of the present disclosure
The image feature vector of the picture of CNN extractions is crossed, coding characteristic vector the problem of with being extracted by the first RNN is spliced,
Union feature vector is obtained, and the union feature vector is decoded by decoding neutral net, generation is relevant with picture
The answer of problem.The answering the problem of related with picture can be obtained according to the picture answering method and device of the embodiment of the present disclosure
Case, and CNN, the first RNN of the use of each step and decoding neutral net are in the case where three is as an entirety, are held
What the training to end obtained, cumulative errors are eliminated, improve the accuracy rate of question and answer.
It should be appreciated that the general description and following detailed description of the above are only exemplary and explanatory, not
The disclosure can be limited.
Brief description of the drawings
Attached drawing herein is merged in specification and forms the part of this specification, shows the implementation for meeting the disclosure
Example, and be used to together with specification to explain the principle of the disclosure.
Fig. 1 is a kind of flow chart of picture answering method according to an exemplary embodiment.
Fig. 2 a are an examples of the picture of the reception according to an exemplary embodiment.
Fig. 2 b are an examples of the picture of the reception according to an exemplary embodiment.
Fig. 2 c are an examples of the neutral net connection according to an exemplary embodiment.
Fig. 3 is a kind of flow chart of picture answering method according to an exemplary embodiment.
Fig. 4 is a kind of flow chart of picture answering method according to an exemplary embodiment.
Fig. 5 is a kind of flow chart of picture answering method according to an exemplary embodiment.
Fig. 6 is a kind of flow chart of picture answering method according to an exemplary embodiment.
Fig. 7 is a kind of block diagram of picture question and answer system according to an exemplary embodiment.
Fig. 8 is a kind of block diagram of picture question and answer system according to an exemplary embodiment.
Fig. 9 is a kind of block diagram of picture question and answer system according to an exemplary embodiment.
Figure 10 is a kind of block diagram of picture question and answer system according to an exemplary embodiment.
Embodiment
Here exemplary embodiment will be illustrated in detail, its example is illustrated in the accompanying drawings.Following description is related to
During attached drawing, unless otherwise indicated, the same numbers in different attached drawings represent the same or similar key element.Following exemplary embodiment
Described in embodiment do not represent all embodiments consistent with the disclosure.On the contrary, they be only with it is such as appended
The example of the consistent apparatus and method of some aspects be described in detail in claims, the disclosure.
Fig. 1 is a kind of flow chart of picture answering method according to an exemplary embodiment, as shown in Figure 1, the figure
Piece answering method can be used in terminal device or server, comprise the following steps.
In step s 11, receive picture and it is related to the picture the problem of.
In step s 12, the image feature vector of the picture is extracted by CNN.
In step s 13, the coding characteristic vector of described problem is extracted by the first RNN.
In step S14, described image feature vector and the coding characteristic vector are spliced, obtain union feature
Vector.
In step S15, the union feature vector is decoded by decoding neutral net, generates described problem
Answer.
Wherein, the CNN (Convolutional Neural Network, convolutional neural networks), the first RNN
(Recurrent Neural Network, Recognition with Recurrent Neural Network) and decoding neutral net are the feelings in three's network as a whole
Under condition, the neutral net trained in a manner of end to end.
By will be using the image feature vector of the picture of CNN extractions, coding the problem of with using the first RNN extractions be special
Sign vector is spliced, and obtains union feature vector, and the union feature vector is decoded using decoding neutral net, raw
Into the answer to picture the problem of related, related with picture ask can be obtained according to the picture answering method of the embodiment of the present disclosure
The answer of topic, and each step use CNN, the first RNN and decoding neutral net be in the case where three is as an entirety,
Carry out what training end to end obtained, eliminate cumulative errors, improve the accuracy rate of question and answer.
The picture answering method of the embodiment of the present disclosure is illustrated by taking terminal device as an example below.What terminal device received
Picture can be the picture of arbitrary size, and the disclosure does not limit this.The problem of related to picture can be directed to the picture
Any problem, for example, specifying quantity etc. of object in picture.The disclosure does not limit this.In one example, terminal is set
The standby picture received shown in Fig. 2 a, it is related to the picture the problem of can have several dogs in picture;Terminal device receives Fig. 2 b
Shown picture, it is related to the picture the problem of can be people in picture in What for.
In a kind of possible implementation, terminal device can receive a pictures and a problem.Terminal device can
To generate answer of the problem for the picture by step S12 to step S15, so as to complete picture question and answer.
In a kind of possible implementation, terminal device can receive at least two pictures and a problem.Terminal is set
It is standby the problem to be generated for the answer per pictures by step S12 to step S15 respectively, so as to carry out based on content
Picture retrieval.For example, problem, which is the picture for having dog in picture, several, terminal device can be obtained by step S12 to step
S15 obtains the quantity of dog in every pictures, so that it is determined that going out to have the quantity of the picture of dog.
CNN can be used for the processing of picture, carry out the feature extraction of picture.The basic structure of CNN includes two layers, and one is
Feature extraction layer, the input of each neuron are connected with the local acceptance region of preceding layer, and extract the local feature.Once should
After local feature is extracted, its position relationship between further feature is also decided therewith;The second is Feature Mapping layer, network
Each computation layer be made of multiple Feature Mappings, each Feature Mapping is a plane, the weights of all neurons in plane
It is equal.User can be with the structure of self-defined CNN (for example, connectivity etc.) between the quantity of layer, the type of layer, layer, Ran Houtong
Cross the parameter that training determines every layer.In step s 12, the feature extraction of picture can be carried out by trained CNN, is carried
The feature taken can represent to obtain the image feature vector of picture using vector space, so that picture is mapped as low latitudes vector
Space.
RNN can be used for natural language processing, carry out the feature extraction of sequence data.In next word of prediction sentence
When, it usually needs use word above, in RNN the current output of a sequence also have relation with output above, can be to preceding
The information in face is remembered and is applied in the calculating currently exported, is predicted using the association of front and rear word.RNN includes input
Unit, implicit unit and output unit.The information flow of an one-way flow is to reach hidden unit from input unit in RNN,
At the same time the information flow of another one-way flow returns to hidden unit from output unit, and the input of hidden layer further includes
The state of one hidden layer.In one example, to a sentence comprising 5 words, one five layers of RNN can be expanded into, often
Layer represents a word.User with the structure of self-defined RNN, then can determine every layer of parameter by training.In step s 13,
The feature extraction of problem can be carried out by trained first RNN, the feature of extraction can represent to obtain using vector space
The coding characteristic vector of problem.
Decoding neutral net can be being capable of processing sequence data and the neutral net consistent with union feature vector dimension.
For example, RNN, MLP (Multi-layer Perceptron, multilayer neural network) etc..User can self-defined decoding nerve net
Type, the structure of network, then determine every layer of parameter by training.In step S15, trained decoding nerve can be passed through
Network carries out the decoding of union feature vector, so as to generate the answer of problem.
It should be noted that decoding neutral net can use the neutral net that can be led, so according to the picture of the disclosure
The neutral net that each step of answering method uses can be led, and several steps can be carried out at the same time and trained end to end,
So as to the cumulative errors between removal process, the accuracy rate of question and answer is improved.
Fig. 2 c are an exemplary neutral net connection figures.As shown in Figure 2 c, the output of the output terminal of CNN and the first RNN
End is connected with decoding the input terminal of neutral net.Picture is inputted from the input terminal A of CNN, is inputted from the input terminal B of the first RNN
Problem, the output of CNN and the output of the first RNN are stitched together as the input of decoding neutral net, decode neutral net
Output terminal C obtain be problem answer.When carrying out CNN, the first RNN and decoding the training of neutral net, according to the disclosure
The picture answering method of embodiment can using CNN, the first RNN and decoding neutral net as an overall network, using A, B as
The input terminal of overall network, by the output terminal of C networks as a whole, the defeated of overall network is carried out using A, B and C to overall network
Enter training of the end to output terminal.CNN, the first RNN and decoding neutral net need not be trained respectively, it is single without being concerned about
The accuracy of neutral net.The picture answering method of the embodiment of the present disclosure makees CNN, the first RNN and decoding neutral net three
In the case of for overall network, it is trained in a manner of end to end, obtained CNN, the first RNN and decoding neutral net conduct
Step S11 corresponding neutral nets used into step S15.In this way, it can be made full use of with the cumulative errors between removal process
The powerful learning ability of deep learning, the effective accuracy rate for improving question and answer.
Fig. 3 is a kind of flow chart of picture answering method according to an exemplary embodiment, as shown in figure 3, step
The image feature vector that S12 extracts the picture by CNN can be implemented as:
In step S121, the picture received is zoomed into fixed dimension, obtains standard picture.
In step S122, the standard picture is inputted in CNN.
In step S123, by the output of the CNN as a result, being determined as the image feature vector of the picture.
CNN is mainly used to identify the X-Y scheme of displacement, scaling and other forms distortion consistency.Due to the feature of CNN
Extract layer is learnt by training data, so when using CNN, avoids explicit feature extraction, and implicitly from instruction
Practice and learnt in data;Furthermore since the neuron weights on same Feature Mapping face are identical, so network can be learned parallel
Practise.The special construction that CNN is shared with its local weight has the superiority of uniqueness in terms of image procossing, and weights, which are shared, to be reduced
The complexity of network, the image of particularly more dimensional input vectors can directly input network this feature and avoid feature extraction
The complexity of data reconstruction in journey.
After the structure of CNN and every layer of parameter determine, the picture of fixed dimension can be handled.In a kind of possible realization side
In formula, the picture received can be zoomed into fixed dimension using space conversion matrices, obtain standard picture.
Standard picture is inputted in CNN, the feature vector of the standard picture can be obtained.In view of standard picture and picture
Feature there is similitude, image feature vector that can be using the feature vector of standard picture as picture.
Fig. 4 is a kind of flow chart of picture answering method according to an exemplary embodiment, as shown in figure 4, step
The coding characteristic vector that S13 extracts described problem by the first RNN can be implemented as:
In step S131, described problem is segmented, obtains the corresponding word insertion sequence of described problem.
In step S132, by the first RNN described in institute's predicate insertion sequence inputting.
In step S133, by the output of the first RNN as a result, being determined as the coding characteristic vector of described problem.
Natural language is handled due to the use of computer, it is necessary to which natural language is converted into the symbol that machine can identify
Number, line number of going forward side by side value represents.In a kind of possible implementation, the vector representation one of a designated length can be used
Word.For example, " picture " in " having several dogs in picture ", " in ", " having ", " several " and " dog " vector can be used respectively
Represent, these vectors and can represent sentence above.
Word can be mapped to low-dimensional (100-500 dimensions) vector row by word insertion, and similar word may map to it is similar
Direction, the Semantic Similarity between so different words is encoded.In one example, if two word neighbours' word distributional classes
Seemingly, then the semanteme of two words may be close.Such as:Neighbours' word of cat is pet, owner, feeding, is rubbed and mew, neighbours' word of dog are
Pet, owner, feeding, sting and Wang, it is believed that the semantic similarity of cat and dog.In word insertion, building of corpus can be based on
Co-occurrence matrix, each unit can be the coupling index weighed between two objects in matrix, compared to being used in correlation technique
One-hot coding (One hot encording) is separately encoded to the progress of each word, and the dimension of word embedded coding is low, facilitates engineering
Practise model treatment and carry out unsupervised learning, naturally there is the effect after birdsing of the same feather flock together.
After terminal device is segmented problem, word sequence can be obtained, word insertion side is used respectively to the word of word sequence
Method (for example, word2vec) can obtain the feature vector of each word, and the combination of eigenvectors of all words forms in word sequence
Feature vector can be with the corresponding word insertion sequence of problem of representation.After the first RNN of the corresponding word insertion sequence inputting of problem, obtain
The output of the first RNN the coding characteristic of problem can be used as vectorial.
In a kind of possible implementation, RNN can use LSTM, and (Long Short-Term Memory, length is in short-term
Memory models) or GRU (Gated Recurrent Unit Recurrent Neural Networks).
LSTM can use different functions to go to calculate the state of hidden layer., can be the unit in matrix in LSTM
Regard black box as to preserve the state of the preservation before current input, what these units can with reference to state above, currently
Memory determines which unit suppresses which unit excitement with current input.
Influence of the word to the state of current hidden layer in GRU at position different in sequence is different, shadow more above
Sound is smaller, i.e., each front position has carried out current influence distance weighted, and apart from more remote, weights are smaller;Producing error
When, error is probably to be triggered by some or several words, and corresponding word weight can be updated.
LSTM and GRU can retain the key character of word insertion sequence, ensure that it will not when long Sequence Spread
Lose, the effect for extracting the coding characteristic vector of problem is relatively good.
After obtaining the image feature vector of picture and the coding characteristic vector of problem, terminal device can be special by described image
Coding characteristic vector described in sign vector sum is spliced, and union feature vector is obtained, so as to establish the association of the two.Can in one kind
In the implementation of energy, coding characteristic vector can be directly connected on behind image feature vector by terminal device, be combined
Feature vector.
Fig. 5 is a kind of flow chart of picture answering method according to an exemplary embodiment, the decoding nerve net
Network is MLP, as shown in figure 5, step S15 decodes the union feature vector by decoding neutral net, described in generation
The answer of problem can be implemented as:
In step S151, by union feature vector input MLP network, feature set is obtained.
In step S152, classified using grader to the feature set, obtain the classification of the feature set.
In step S153, by the classification of the feature set, it is determined as the answer of described problem.
Terminal device can obtain a data set, which includes all possible answer, each is possible to answer
Case can be considered as a classification.For example, it is red, two, man, cap, sky, animal, dancing etc., so hundreds and thousands of a altogether
Classification.Terminal device can carry out further feature abstraction with a MLP network to joint feature vector, obtain feature set
(being made of some probable values), afterwards classifies feature set using grader (Softmax), can correspond to maximum probability
Classification be determined as the classification of the feature set, so that the classification of the feature set to be determined as to the answer of problem.
The answer for the problem of joint feature vector is decoded, obtained by MLP is a classification, and form is more simple
It is single.For example, the picture shown in Fig. 2 b, problem can be to dance in What for, the answer of generation for the people in picture.
Fig. 6 is a kind of flow chart of picture answering method according to an exemplary embodiment, the decoding nerve net
Network is RNN networks, as shown in fig. 6, step S15 decodes the union feature vector by decoding neutral net, generation
The answer of described problem is also implemented as:
In step S154, by the 2nd RNN of union feature vector input.
In step S155, the output result of the 2nd RNN is determined as to the answer of described problem.
First RNN and the 2nd RNN is two RNN, and the first RNN is consistent with the dimension of image feature vector, the 2nd RNN and connection
The dimension for closing feature vector is consistent.Terminal device can be by union feature vector the 2nd RNN networks of input, and input results can be true
It is set to the answer of described problem.2nd RNN equally can be LSTM or GRU.
Joint feature vector is decoded by the 2nd RNN, obtained answer is the sentence of indefinite length.For example, figure
Picture shown in 2b, problem are that the people in picture can dance in What for, the answer of generation for them.
Fig. 7 is a kind of picture question and answer system block diagram according to an exemplary embodiment.With reference to Fig. 7, which wraps
Include receiving module 71, the first extraction module 72, the second extraction module 73, concatenation module 74 and generation module 75.
The receiving module 71 be configured as receive picture and it is related to the picture the problem of.
First extraction module 72 be configured as by convolutional neural networks CNN extract the characteristics of image of the picture to
Amount.
Second extraction module 73 is configured as extracting the coding characteristic of described problem by first circulation neutral net RNN
Vector.
The concatenation module 74 is configured as being spliced described image feature vector and the coding characteristic vector, obtains
Union feature vector.
The generation module 75 is configured as decoding the union feature vector by decoding neutral net, generates institute
State the answer of problem.
Wherein, the CNN, the first RNN and decoding neutral net are in the case of three as a whole network, are arrived with end
The neutral net that the mode at end is trained.
Fig. 8 is a kind of picture question and answer system block diagram according to an exemplary embodiment.With reference to Fig. 8, a kind of possible
Implementation in, the first extraction module 72 includes:Submodule 721 is scaled, image input submodule 722 and first determines submodule
Block 723.
The picture that the scaling submodule 721 is configured as receiving zooms to fixed dimension, obtains standard picture.
The image input submodule 722 is configured as inputting the standard picture in CNN.
First determination sub-module 723 is configured as the output of the CNN as a result, the image for being determined as the picture is special
Sign vector.
In a kind of possible implementation, the second extraction module 73 includes:Segment submodule 731, sequence inputting submodule
732 and second determination sub-module 733 of block.
The participle submodule 731 is configured as being segmented described problem, obtains the corresponding word insertion sequence of described problem
Row.
The sequence inputting submodule 732 is configured as in the first RNN described in institute's predicate insertion sequence inputting.
Second determination sub-module 733 is configured as the output by the first RNN as a result, being determined as the volume of described problem
Code feature vector.
In a kind of possible implementation, the decoding neutral net is MLP, and generation module 75 includes:Primary vector
Input submodule 751, classification 752 and the 3rd determination sub-module 753 of submodule.
The primary vector input submodule 751 is configured as, by union feature vector input MLP network, obtaining feature
Collection.
The classification submodule 752 is configured with grader and classifies to the feature set, obtains the feature set
Classification.
3rd determination sub-module 753 is configured as the classification of the feature set, is determined as the answer of described problem.
The decoding neutral net is RNN networks, and generation module 75 further includes:Secondary vector input submodule 754 and
Four determination sub-modules 755.
The secondary vector input submodule 754 is configured as in the 2nd RNN of union feature vector input.
4th determination sub-module 755 is configured as the output result of the 2nd RNN being determined as answering for described problem
Case.
On the device in above-described embodiment, wherein modules perform the concrete mode of operation in related this method
Embodiment in be described in detail, explanation will be not set forth in detail herein.
Fig. 9 is a kind of block diagram of device 800 for picture question and answer according to an exemplary embodiment.For example, dress
It can be mobile phone to put 800, computer, digital broadcast terminal, messaging devices, game console, tablet device, medical treatment
Equipment, body-building equipment, personal digital assistant etc..
With reference to Fig. 9, device 800 can include following one or more assemblies:Processing component 802, memory 804, power supply
Component 806, multimedia component 808, audio component 810, the interface 812 of input/output (I/O), sensor component 814, and
Communication component 816.
The integrated operation of the usual control device 800 of processing component 802, such as with display, call, data communication, phase
The operation that machine operates and record operation is associated.Processing component 802 can refer to including one or more processors 820 to perform
Order, to complete all or part of step of above-mentioned method.In addition, processing component 802 can include one or more modules, just
Interaction between processing component 802 and other assemblies.For example, processing component 802 can include multi-media module, it is more to facilitate
Interaction between media component 808 and processing component 802.
Memory 804 is configured as storing various types of data to support the operation in device 800.These data are shown
Example includes the instruction of any application program or method for being operated on device 800, and contact data, telephone book data, disappears
Breath, picture, video etc..Memory 804 can be by any kind of volatibility or non-volatile memory device or their group
Close and realize, as static RAM (SRAM), electrically erasable programmable read-only memory (EEPROM) are erasable to compile
Journey read-only storage (EPROM), programmable read only memory (PROM), read-only storage (ROM), magnetic memory, flash
Device, disk or CD.
Power supply module 806 provides electric power for the various assemblies of device 800.Power supply module 806 can include power management system
System, one or more power supplys, and other components associated with generating, managing and distributing electric power for device 800.
Multimedia component 808 is included in the screen of one output interface of offer between described device 800 and user.One
In a little embodiments, screen can include liquid crystal display (LCD) and touch panel (TP).If screen includes touch panel, screen
Curtain may be implemented as touch-screen, to receive input signal from the user.Touch panel includes one or more touch sensings
Device is to sense the gesture on touch, slip and touch panel.The touch sensor can not only sense touch or sliding action
Border, but also detect and the duration and pressure associated with the touch or slide operation.In certain embodiments, more matchmakers
Body component 808 includes a front camera and/or rear camera.When device 800 is in operator scheme, such as screening-mode or
During video mode, front camera and/or rear camera can receive exterior multi-medium data.Each front camera and
Rear camera can be a fixed optical lens system or have focusing and optical zoom capabilities.
Audio component 810 is configured as output and/or input audio signal.For example, audio component 810 includes a Mike
Wind (MIC), when device 800 is in operator scheme, during such as call model, logging mode and speech recognition mode, microphone by with
It is set to reception external audio signal.The received audio signal can be further stored in memory 804 or via communication set
Part 816 is sent.In certain embodiments, audio component 810 further includes a loudspeaker, for exports audio signal.
I/O interfaces 812 provide interface between processing component 802 and peripheral interface module, and above-mentioned peripheral interface module can
To be keyboard, click wheel, button etc..These buttons may include but be not limited to:Home button, volume button, start button and lock
Determine button.
Sensor component 814 includes one or more sensors, and the state for providing various aspects for device 800 is commented
Estimate.For example, sensor component 814 can detect opening/closed mode of device 800, and the relative positioning of component, for example, it is described
Component is the display and keypad of device 800, and sensor component 814 can be with 800 1 components of detection device 800 or device
Position change, the existence or non-existence that user contacts with device 800,800 orientation of device or acceleration/deceleration and device 800
Temperature change.Sensor component 814 can include proximity sensor, be configured to detect without any physical contact
Presence of nearby objects.Sensor component 814 can also include optical sensor, such as CMOS or ccd image sensor, for into
As being used in application.In certain embodiments, which can also include acceleration transducer, gyro sensors
Device, Magnetic Sensor, pressure sensor or temperature sensor.
Communication component 816 is configured to facilitate the communication of wired or wireless way between device 800 and other equipment.Device
800 can access the wireless network based on communication standard, such as WiFi, 2G or 3G, or combinations thereof.In an exemplary implementation
In example, communication component 816 receives broadcast singal or broadcast related information from external broadcasting management system via broadcast channel.
In one exemplary embodiment, the communication component 816 further includes near-field communication (NFC) module, to promote junction service.Example
Such as, in NFC module radio frequency identification (RFID) technology can be based on, Infrared Data Association (IrDA) technology, ultra wide band (UWB) technology,
Bluetooth (BT) technology and other technologies are realized.
In the exemplary embodiment, device 800 can be believed by one or more application application-specific integrated circuit (ASIC), numeral
Number processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array
(FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for performing the above method.
In the exemplary embodiment, a kind of non-transitorycomputer readable storage medium including instructing, example are additionally provided
Such as include the memory 804 of instruction, above-metioned instruction can be performed to complete the above method by the processor 820 of device 800.For example,
The non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD-ROM, tape, floppy disk
With optical data storage devices etc..
Figure 10 is a kind of block diagram of device 1900 for picture question and answer according to an exemplary embodiment.For example,
Device 1900 may be provided as a server.With reference to Figure 10, device 1900 includes processing component 1922, it further comprises one
A or multiple processors, and as the memory resource representated by memory 1932, can be by processing component 1922 for storing
The instruction of execution, such as application program.The application program stored in memory 1932 can include one or more every
One module for corresponding to one group of instruction.In addition, processing component 1922 is configured as execute instruction, to perform the above method.
Device 1900 can also include a power supply module 1926 and be configured as the power management of executive device 1900, one
Wired or wireless network interface 1950 is configured as device 1900 being connected to network, and input and output (I/O) interface
1958.Device 1900 can be operated based on the operating system for being stored in memory 1932, such as Windows ServerTM, Mac
OS XTM, UnixTM, LinuxTM, FreeBSDTM or similar.In the exemplary embodiment, additionally provide it is a kind of including instruction
Non-transitorycomputer readable storage medium, such as the memory 1932 including instruction, above-metioned instruction can be by devices 1900
Reason component 1922 is performed to complete the above method.For example, the non-transitorycomputer readable storage medium can be ROM, with
Machine access memory (RAM), CD-ROM, tape, floppy disk and optical data storage devices etc..
Those skilled in the art will readily occur to the disclosure its after considering specification and putting into practice invention disclosed herein
Its embodiment.This application is intended to cover any variations, uses, or adaptations of the disclosure, these modifications, purposes or
Person's adaptive change follows the general principle of the disclosure and including the undocumented common knowledge in the art of the disclosure
Or conventional techniques.Description and embodiments are considered only as exemplary, and the true scope and spirit of the disclosure are by following
Claim is pointed out.
It should be appreciated that the present disclosure is not limited to the precise structures that have been described above and shown in the drawings, and
And various modifications and changes may be made without departing from the scope thereof.The scope of the present disclosure is only limited by appended claim.
Claims (12)
- A kind of 1. picture answering method, it is characterised in that including:Receive picture and it is related to the picture the problem of;The image feature vector of the picture is extracted by convolutional neural networks CNN;The coding characteristic vector of described problem is extracted by first circulation neutral net RNN;Described image feature vector and the coding characteristic vector are spliced, obtain union feature vector;The union feature vector is decoded by decoding neutral net, generates the answer of described problem;Wherein, the CNN, the first RNN and decoding neutral net are in the case of three as a whole network, with end to end The neutral net that mode is trained.
- 2. picture answering method according to claim 1, it is characterised in that the image that the picture is extracted by CNN is special Sign vector, including:The picture received is zoomed into fixed dimension, obtains standard picture;The standard picture is inputted in CNN;By the output of the CNN as a result, being determined as the image feature vector of the picture.
- 3. picture problem method according to claim 1, it is characterised in that the volume of described problem is extracted by the first RNN Code feature vector, including:Described problem is segmented, obtains the corresponding word insertion sequence of described problem;By in the first RNN described in institute's predicate insertion sequence inputting;By the output of the first RNN as a result, being determined as the coding characteristic vector of described problem.
- 4. picture answering method according to claim 1, it is characterised in that the decoding neutral net is multilayer nerve net Network MLP, decodes the union feature vector by decoding neutral net, generates the answer of described problem, including:By union feature vector input MLP network, feature set is obtained;Classified using grader to the feature set, obtain the classification of the feature set;By the classification of the feature set, it is determined as the answer of described problem.
- 5. picture answering method according to claim 1, it is characterised in that the decoding neutral net is RNN networks, is led to Cross decoding neutral net to decode the union feature vector, generate the answer of described problem, including:By in the 2nd RNN of union feature vector input;The output result of 2nd RNN is determined as to the answer of described problem.
- A kind of 6. picture question and answer system, it is characterised in that including:Receiving module, for receive picture and it is related to the picture the problem of;First extraction module, for extracting the image feature vector of the picture by convolutional neural networks CNN;Second extraction module, for extracting the coding characteristic vector of described problem by first circulation neutral net RNN;Concatenation module, for described image feature vector and the coding characteristic vector to be spliced, obtain union feature to Amount;Generation module, for being decoded by decoding neutral net to the union feature vector, generates answering for described problem Case;Wherein, the CNN, the first RNN and decoding neutral net are in the case of three as a whole network, with end to end The neutral net that mode is trained.
- 7. picture question and answer system according to claim 6, it is characterised in that the first extraction module includes:Submodule is scaled, for the picture received to be zoomed to fixed dimension, obtains standard picture;Image input submodule, for the standard picture to be inputted in CNN;First determination sub-module, for exporting the CNN as a result, being determined as the image feature vector of the picture.
- 8. picture problem device according to claim 6, it is characterised in that the second extraction module includes:Submodule is segmented, for described problem to be segmented, obtains the corresponding word insertion sequence of described problem;Sequence inputting submodule, for by the first RNN described in institute's predicate insertion sequence inputting;Second determination sub-module, for exporting the first RNN as a result, being determined as the coding characteristic vector of described problem.
- 9. picture question and answer system according to claim 6, it is characterised in that the decoding neutral net is multilayer nerve net Network MLP, generation module include:Primary vector input submodule, for union feature vector to be inputted MLP network, obtains feature set;Classification submodule, for classifying using grader to the feature set, obtains the classification of the feature set;3rd determination sub-module, for by the classification of the feature set, being determined as the answer of described problem.
- 10. picture question and answer system according to claim 6, it is characterised in that the decoding neutral net is RNN networks, Generation module includes:Secondary vector input submodule, for by the 2nd RNN of union feature vector input;4th determination sub-module, for the output result of the 2nd RNN to be determined as to the answer of described problem.
- A kind of 11. picture question and answer system, it is characterised in that including:Processor;For storing the memory of processor-executable instruction;Wherein, the processor is configured as performing the method as any one of claim 1 to 5.
- 12. a kind of non-transitorycomputer readable storage medium, real when the instruction in the storage medium is performed by processor Method any one of existing claim 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711308733.1A CN108021897B (en) | 2017-12-11 | 2017-12-11 | Picture question and answer method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711308733.1A CN108021897B (en) | 2017-12-11 | 2017-12-11 | Picture question and answer method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108021897A true CN108021897A (en) | 2018-05-11 |
CN108021897B CN108021897B (en) | 2021-06-15 |
Family
ID=62072794
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711308733.1A Active CN108021897B (en) | 2017-12-11 | 2017-12-11 | Picture question and answer method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108021897B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109165285A (en) * | 2018-08-24 | 2019-01-08 | 北京小米智能科技有限公司 | Handle the method, apparatus and storage medium of multi-medium data |
CN109376963A (en) * | 2018-12-10 | 2019-02-22 | 杭州世平信息科技有限公司 | A kind of criminal case charge law article unified prediction neural network based |
CN109886444A (en) * | 2018-12-03 | 2019-06-14 | 深圳市北斗智能科技有限公司 | A kind of traffic passenger flow forecasting, device, equipment and storage medium in short-term |
CN110009427A (en) * | 2019-04-10 | 2019-07-12 | 国网浙江省电力有限公司 | A kind of electric power consumption sum intelligent Forecasting based on deep-cycle neural network |
CN110287295A (en) * | 2019-05-14 | 2019-09-27 | 闽江学院 | Question and answer robot construction method and system based on small routine |
CN111309875A (en) * | 2018-12-10 | 2020-06-19 | 百度在线网络技术(北京)有限公司 | Question answering method, device, equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105574133A (en) * | 2015-12-15 | 2016-05-11 | 苏州贝多环保技术有限公司 | Multi-mode intelligent question answering system and method |
US9424494B1 (en) * | 2016-01-28 | 2016-08-23 | International Business Machines Corporation | Pure convolutional neural network localization |
CN106649542A (en) * | 2015-11-03 | 2017-05-10 | 百度(美国)有限责任公司 | Systems and methods for visual question answering |
CN106951473A (en) * | 2017-03-06 | 2017-07-14 | 浙江大学 | Towards the construction method of the deep vision question answering system of dysopia personage |
-
2017
- 2017-12-11 CN CN201711308733.1A patent/CN108021897B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106649542A (en) * | 2015-11-03 | 2017-05-10 | 百度(美国)有限责任公司 | Systems and methods for visual question answering |
CN105574133A (en) * | 2015-12-15 | 2016-05-11 | 苏州贝多环保技术有限公司 | Multi-mode intelligent question answering system and method |
US9424494B1 (en) * | 2016-01-28 | 2016-08-23 | International Business Machines Corporation | Pure convolutional neural network localization |
CN106951473A (en) * | 2017-03-06 | 2017-07-14 | 浙江大学 | Towards the construction method of the deep vision question answering system of dysopia personage |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109165285A (en) * | 2018-08-24 | 2019-01-08 | 北京小米智能科技有限公司 | Handle the method, apparatus and storage medium of multi-medium data |
CN109886444A (en) * | 2018-12-03 | 2019-06-14 | 深圳市北斗智能科技有限公司 | A kind of traffic passenger flow forecasting, device, equipment and storage medium in short-term |
CN109886444B (en) * | 2018-12-03 | 2023-07-11 | 深圳市北斗智能科技有限公司 | Short-time traffic passenger flow prediction method, device, equipment and storage medium |
CN109376963A (en) * | 2018-12-10 | 2019-02-22 | 杭州世平信息科技有限公司 | A kind of criminal case charge law article unified prediction neural network based |
CN111309875A (en) * | 2018-12-10 | 2020-06-19 | 百度在线网络技术(北京)有限公司 | Question answering method, device, equipment and storage medium |
CN109376963B (en) * | 2018-12-10 | 2022-04-08 | 杭州世平信息科技有限公司 | Criminal case and criminal name and criminal law joint prediction method based on neural network |
CN111309875B (en) * | 2018-12-10 | 2023-08-04 | 百度在线网络技术(北京)有限公司 | Method, device, equipment and storage medium for answering questions |
CN110009427A (en) * | 2019-04-10 | 2019-07-12 | 国网浙江省电力有限公司 | A kind of electric power consumption sum intelligent Forecasting based on deep-cycle neural network |
CN110287295A (en) * | 2019-05-14 | 2019-09-27 | 闽江学院 | Question and answer robot construction method and system based on small routine |
Also Published As
Publication number | Publication date |
---|---|
CN108021897B (en) | 2021-06-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108021897A (en) | Picture answering method and device | |
WO2024045444A1 (en) | Processing method and apparatus for visual question answering task, and device and non-volatile readable storage medium | |
CN105426857B (en) | Human face recognition model training method and device | |
CN112200062B (en) | Target detection method and device based on neural network, machine readable medium and equipment | |
JP2022508790A (en) | Image area positioning method, model training method and related equipment | |
CN108399409A (en) | Image classification method, device and terminal | |
CN109871896A (en) | Data classification method, device, electronic equipment and storage medium | |
CN110110145A (en) | Document creation method and device are described | |
CN110446063A (en) | Generation method, device and the electronic equipment of video cover | |
CN110209784B (en) | Message interaction method, computer device and storage medium | |
CN107221330A (en) | Punctuate adding method and device, the device added for punctuate | |
CN109871843A (en) | Character identifying method and device, the device for character recognition | |
CN107845062A (en) | image generating method and device | |
TWI735112B (en) | Method, apparatus and electronic device for image generating and storage medium thereof | |
CN110298212A (en) | Model training method, Emotion identification method, expression display methods and relevant device | |
CN112906484B (en) | Video frame processing method and device, electronic equipment and storage medium | |
CN109360197A (en) | Processing method, device, electronic equipment and the storage medium of image | |
CN115203471B (en) | Attention mechanism-based multimode fusion video recommendation method | |
CN111814538B (en) | Method and device for identifying category of target object, electronic equipment and storage medium | |
CN115131604A (en) | Multi-label image classification method and device, electronic equipment and storage medium | |
CN110121719A (en) | Device, method and computer program product for deep learning | |
CN116310983A (en) | Multi-mode emotion recognition method and device | |
CN112488157B (en) | Dialogue state tracking method and device, electronic equipment and storage medium | |
CN114693905A (en) | Text recognition model construction method, text recognition method and device | |
CN112269881A (en) | Multi-label text classification method and device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |