CN108875758A - Information processing method and device and information detecting method and device - Google Patents
Information processing method and device and information detecting method and device Download PDFInfo
- Publication number
- CN108875758A CN108875758A CN201710320880.4A CN201710320880A CN108875758A CN 108875758 A CN108875758 A CN 108875758A CN 201710320880 A CN201710320880 A CN 201710320880A CN 108875758 A CN108875758 A CN 108875758A
- Authority
- CN
- China
- Prior art keywords
- characteristic pattern
- group
- verbal description
- vector
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
Abstract
A kind of information processing method and device and information detecting method and device are disclosed, wherein information processing method includes:One group of characteristic pattern with preset width and predetermined altitude is extracted from each sample image in multiple sample images, wherein the characteristic pattern in one group of characteristic pattern is corresponding from different characteristics of image respectively;And verbal description model is trained based on extracted one group of characteristic pattern and for the verbal description of multiple sample images label, verbal description model is used to generate corresponding verbal description according to input picture, wherein, training verbal description model includes the previous state vector based on one group of characteristic pattern and Recognition with Recurrent Neural Network model, calculates center and the size of the focus window on one group of characteristic pattern.In accordance with an embodiment of the present disclosure, the more suitable verbal description of image can be generated.
Description
Technical field
This disclosure relates to field of information processing, and in particular to not only consider the position of the focus window in image but also consideration
The information processing method and device and information detecting method and device of the size of focus window.
Background technique
Understand picture material and with natural language description picture material be artificial intelligence field major issue and ultimate mesh
One of mark.Description image not only needs to identify the object in image, at the same need in natural language description image object and
Relationship between object.It therefore, is a very challenging problem with natural language description picture material.But currently,
There are certain methods to attempt to solve this challenge.For example, object first in detection image and inferring between object
Relationship, be then based on the natural sentence that template generation describes picture material.There is also based on the end-to-end of neural network model
Method.In addition, attention model is added in a model, and the attention model only learns the position of fixed size focus window automatically
It sets.
Summary of the invention
The brief overview about the disclosure is given below, in order to provide the basic of some aspects about the disclosure
Understand.It is understood, however, that this general introduction is not the exhaustive general introduction about the disclosure.It is not intended to for determining
The critical component or pith of the disclosure, nor being intended to limit the scope of the present disclosure.Its purpose is only with letter
The form of change provides certain concepts about the disclosure, in this, as preamble in greater detail given later.
In view of problem above, purpose of this disclosure is to provide the position of the focus window not only considered in image but also considerations
The information processing method and device and information detecting method and device of the size of focus window.
According to the one side of the disclosure, a kind of information processing method is provided, including:It can be from multiple sample images
Each sample image extracts one group of characteristic pattern with preset width and predetermined altitude, wherein the spy in one group of characteristic pattern
Sign figure is corresponding from different characteristics of image respectively;It and based on extracted one group of characteristic pattern and can be the multiple sample
The verbal description of image tagged trains verbal description model, and verbal description model is used to generate corresponding text according to input picture
Description, wherein training verbal description model may include previous based on one group of characteristic pattern and Recognition with Recurrent Neural Network model
State vector calculates center and the size of the focus window on one group of characteristic pattern.
According to another aspect of the present disclosure, a kind of information processing unit is provided, including:Extraction unit can be configured
One group of characteristic pattern with preset width and predetermined altitude is extracted at from each sample image in multiple sample images, wherein
Characteristic pattern in one group of characteristic pattern is corresponding from different characteristics of image respectively;And training unit, it may be configured to
Verbal description model is trained based on extracted one group of characteristic pattern and for the verbal description of the multiple sample image label, text
Word description model is used to generate corresponding verbal description according to input picture, wherein training verbal description model may include being based on
The previous state vector of one group of characteristic pattern and Recognition with Recurrent Neural Network model calculates the focus window on one group of characteristic pattern
Center and size.
According to the another aspect of the disclosure, a kind of information detecting method is provided, including:It can extract and have from input picture
Have one group of characteristic pattern of preset width and predetermined altitude, wherein characteristic pattern in one group of characteristic pattern respectively from different figures
As feature is corresponding;And it can be based on extracted one group of characteristic pattern, it is generated and is inputted using housebroken verbal description model
The corresponding verbal description of image, wherein can using the corresponding verbal description that housebroken verbal description model generates input picture
To include:Based on the previous state vector of one group of characteristic pattern and Recognition with Recurrent Neural Network model, one group of characteristic pattern is calculated
On focus window center and size.
According to the other aspects of the disclosure, additionally provide for realizing the above-mentioned computer program according to disclosed method
Code and computer program product and thereon record have this for realizing the above-mentioned computer program according to disclosed method
The computer readable storage medium of code.
The other aspects of the embodiment of the present disclosure are provided in following specification part, wherein be described in detail for abundant
Ground discloses the preferred embodiment of the embodiment of the present disclosure, without applying restriction to it.
Detailed description of the invention
The disclosure can by reference to being better understood below in association with detailed description given by attached drawing, wherein
The same or similar appended drawing reference is used in all the appended drawings to indicate same or similar component.The attached drawing is together under
The detailed description in face includes in the present specification and to form part of specification together, for the disclosure is further illustrated
Preferred embodiment and explain the disclosure principle and advantage.Wherein:
Fig. 1 is the flow chart for showing the flow example of information processing method according to an embodiment of the present disclosure;
Fig. 2 is to show the focus window calculated on one group of characteristic pattern in information processing method according to an embodiment of the present disclosure
Center and size flow example flow chart;
Fig. 3 is the block diagram for showing the functional configuration example of information processing unit according to an embodiment of the present disclosure;
Fig. 4 is the flow chart for showing the flow example of information detecting method according to an embodiment of the present disclosure;
Fig. 5 is the exemplary figure for showing input picture and its corresponding verbal description according to the embodiment of the present disclosure;
Fig. 6 is the block diagram for showing the functional configuration example of information detector according to an embodiment of the present disclosure;And
Fig. 7 is the example knot for being shown as the personal computer of adoptable information processing unit in embodiment of the disclosure
The block diagram of structure.
Specific embodiment
It is described hereinafter in connection with exemplary embodiment of the attached drawing to the disclosure.For clarity and conciseness,
All features of actual implementation mode are not described in the description.It should be understood, however, that developing any this actual implementation
Much decisions specific to embodiment must be made during example, to realize the objectives of developer, for example, symbol
Restrictive condition those of related to system and business is closed, and these restrictive conditions may have with the difference of embodiment
Changed.In addition, it will also be appreciated that although development is likely to be extremely complex and time-consuming, to having benefited from the disclosure
For those skilled in the art of content, this development is only routine task.
Here, and also it should be noted is that, in order to avoid having obscured the disclosure because of unnecessary details, in the accompanying drawings
It illustrate only with according to the closely related device structure of the scheme of the disclosure and/or processing step, and be omitted and the disclosure
The little other details of relationship.
The application proposes that a kind of information processing method for calculating the focus window in image, the information processing method are based on following
The previous state vector and picture material of ring neural network model learn automatically the position of image that current state needs to pay close attention to and
Size, then Recognition with Recurrent Neural Network model is worked as based on previous state vector and the calculated image window for needing to pay close attention to, update
Preceding state vector simultaneously calculates the probability for generating each word, ultimately produces the sentence of description image.
It is described in detail with reference to the accompanying drawing in accordance with an embodiment of the present disclosure.
Firstly, the flow example that information processing method 100 according to the embodiment of the present disclosure will be described referring to Fig.1.Fig. 1 is to show
The flow chart of the flow example of information processing method according to an embodiment of the present disclosure out.As shown in Figure 1, according to the reality of the disclosure
The information processing method 100 for applying example includes extraction step S102 and training step S104.
In extraction step S102, can from each sample image in multiple sample images extract have preset width and
One group of characteristic pattern of predetermined altitude, wherein the characteristic pattern in one group of characteristic pattern is corresponding from different characteristics of image respectively.
It can use one group of (p with preset width s and predetermined altitude r that the prior art extracts each sample image
It is a) characteristic pattern fc, fc=CN (image), wherein image indicates the tensor of the m*n*c of image, and m, n and c respectively indicate sample
Length, width and the port number of image;CN () indicates transforming function transformation function;Extracted characteristic pattern fc is the tensor of a r*s*p,
Wherein p indicates the number of feature, i.e., each characteristic pattern in one group described (p) the characteristic pattern fc is respectively and in p characteristics of image
Each characteristics of image it is corresponding.
Preferably, extracting one group of characteristic pattern from each sample image may include using convolutional neural networks model from each
Sample image extracts one group of characteristic pattern.
As an example, can extract each sample image with convolutional neural networks has preset width s and predetermined height
Spend one group of (p) characteristic pattern fc of r, wherein the transforming function transformation function that CN () expression is realized with convolutional neural networks.
It based on extracted one group of characteristic pattern and can be the text of multiple sample images label in training step S104
It describes to train verbal description model, verbal description model can be used for generating corresponding verbal description according to input picture, wherein
Training verbal description model may include the previous state vector based on one group of characteristic pattern and Recognition with Recurrent Neural Network model, calculate one
The center of focus window on group characteristic pattern and size.
As an example, existing for the verbal description that each sample image in multiple sample images marks respectively.It can be with base
Verbal description model is trained in extracted one group of characteristic pattern and the verbal description that is marked, which can use
According to the corresponding verbal description of input picture generation.As an example, one group of characteristic pattern and Recognition with Recurrent Neural Network model can be based on
Previous state vector, calculate one group of characteristic pattern on focus window center and size.
As an example, using HtThe current state vector for indicating t at the time of Recognition with Recurrent Neural Network model, uses Ht-1Indicate circulation
The previous state vector of t-1 at the time of neural network model.By the initial state vector H of Recognition with Recurrent Neural Network model0Initialization
It is 0, i.e. H0=zeros (hd), wherein zeros () indicates full null function, and hd is the state vector of Recognition with Recurrent Neural Network model
Dimension.
Fig. 2 is to show the focus window calculated on one group of characteristic pattern in information processing method according to an embodiment of the present disclosure
Center and size flow example flow chart.Preferably, center and the size of the focus window on one group of characteristic pattern are calculated
It may comprise steps of:In step S202, the using sigmoid function as activation primitive can be applied to one group of characteristic pattern
One neural network model is to be converted into a vector for one group of characteristic pattern, as an example, can be by extracted one group of characteristic pattern
Merge into a vector, and by using sigmoid function as one of the neural network of activation primitive full articulamentum (its be with
Sigmoid function is an example of the first nerves network model of activation primitive) it is non-linear to the vector progress after the merging
It converts and obtains f1 (fc), so that one group of characteristic pattern is converted into a vector, wherein non-linear transform function f1 (fc)=σ
(W1*fc+b1), σ () indicates sigmoid function, and W1 and b1 are respectively parameter matrix and offset parameter vector;In step S204
In, the vector after conversion can be merged with the previous state vector of Recognition with Recurrent Neural Network model and the vector after merging is answered
It is the nervus opticus network model of activation primitive to sigmoid function, as an example, can be by the vector and circulation after conversion
The previous state vector H of neural network modelt-1Merge to obtain vector [f1 (fc), Ht-1], then obtained vector is led to
Cross that (it is using sigmoid function as activation primitive using sigmoid function as a full articulamentum of the neural network of activation primitive
Nervus opticus network model an example) carry out nonlinear transformation f2 ([f1 (fc), Ht-1]), wherein nonlinear transformation letter
Number f2 ([f1 (fc), Ht-1])=σ (W2* [f1 (fc), Ht-1]+b2), W2 and b2 are respectively parameter matrix and offset parameter vector;
It can be activation letter to further being applied through the obtained vector of nervus opticus network model with tanh function in step S206
Several third nerve network model, as an example, can pass through the vector obtained after nonlinear transformation f2 () with tanh letter
Number is that the third nerve network model of activation primitive carries out nonlinear transformation tanh (f2 ([f1 (fc), Ht-1]));In step S208
In, operation can will be carried out through parameter of the obtained vector of third nerve network model compared with being used for, as an example, can be with
Dot product fortune will be done with vector V (it is an example of the parameter for compared with) through the obtained vector of third nerve network model
Result is normalized by sigma function again after calculation, obtains σ (tanh (f2 ([f1 (fc), Ht-1]))⊙V);And in step
In S210, center and the size of focus window can be calculated according to the result and preset width and predetermined altitude of operation, made
For example, the window's position and size for learning can be advised according to the result and preset width and predetermined altitude of operation
Generalized processing, and (cs ', cr ', s ', r ')=(s, r, s, r) ⊙ σ (tanh (f2 ([f1 (fc), Ht-1])) ⊙ V), wherein cs ' and
Cr ' respectively indicates the center of width and short transverse of the focus window on characteristic pattern fc, and s ' and r ' respectively indicate concern
The width and height of window.
Preferably, training verbal description model can also include:Center and size based on focus window obtain one group of spy
Levy the concern feature vector of figure.As an example, can center based on the focus window obtained as described above and size, obtain one
The concern feature vector of group characteristic pattern.
Preferably, obtaining concern feature vector may include:Part corresponding with focus window on one group of characteristic pattern is answered
With fourth nerve network model the part is converted into a vector, and using one vector as concern feature to
Amount.
As an example it is supposed that att is the matrix-vector indicated with characteristic pattern fc same size, in the matrix-vector, with
Value at the corresponding position of focus window is 1, and the value at the position except focus window is 0.Fc ⊙ att is indicated only as a result,
Extract content of the characteristic pattern fc in focus window, that is, can be indicated with fc ⊙ att corresponding with focus window on one group of characteristic pattern
Part.Furthermore, it is possible to which a full articulamentum (it is an example of fourth nerve network model) by neural network is right
Fc ⊙ att carries out transformation Xt=f (fc ⊙ att), and by transformed vector XtAs feature vector is paid close attention to, wherein f () is
Transforming function transformation function.It can will pay close attention to feature vector, XtInput as t moment Recognition with Recurrent Neural Network model.
Preferably, training verbal description model further includes:Based in concern feature vector and Recognition with Recurrent Neural Network model
Previous state vector calculates the current state vector of Recognition with Recurrent Neural Network model, and is obtained and closed based on current state vector
Infuse the corresponding verbal description of window.
As an example, can be according to concern feature vector, XtWith the previous state of t-1 at the time of Recognition with Recurrent Neural Network model
Vector Ht-1Come calculate Recognition with Recurrent Neural Network model current time t current state vector Ht=tanh (Wh*Ht-1+Wi*Xt+
B), wherein Wh and Wi is parameter matrix, and B is offset parameter vector.It is then possible to be obtained based on current state vector and pay close attention to window
The corresponding verbal description of mouth.
Preferably, obtaining verbal description corresponding with focus window may include:To current state vector using the 5th mind
Calculate the probability of occurrence of each word in predetermined dictionary through network model, and by the maximum word of probability of occurrence be determined as with
The corresponding verbal description of focus window.
As an example, can current state vector H to Recognition with Recurrent Neural Network modeltIt is sharp using with softmax function
The neural network model (it is an example of fifth nerve network model) of function living, to calculate every in predetermined dictionary
Probability of occurrence P (the Y of a word Ytt)=softmax (σ (Wp*Ht+ bp)), wherein Wp and bp is parameter matrix and biasing ginseng respectively
Number vector.Also, the maximum word of probability of occurrence is determined as verbal description corresponding with focus window.
Preferably, Recognition with Recurrent Neural Network model can also include shot and long term memory network (LSTM) model.
As an example, when initializing LSTM model, being needed in the case where Recognition with Recurrent Neural Network model is LSTM model
Initialize the state vector H of LSTM model0With cell state vector c0, that is, enable H0=zeros (hd) and c0=zeros
(hd), wherein hd is the dimension of state.
In the case where Recognition with Recurrent Neural Network model is LSTM model, position and size and the calculating of focus window are calculated
The input X of the t moment of LSTM modeltIt is identical as the description above in relation to general Recognition with Recurrent Neural Network model.
The current state vector H for calculating the t moment of LSTM model is described in detail belowtMethod.The t moment of LSTM model
Current state vector HtPrevious state vector H dependent on last momentt-1, last moment cell state vector Ct-1And
The input X at current timet.It is primarily based on previous state vector Ht-1And current input vector XtCalculate three door state vectors, that is,
Input gate state vector it=σ (Wi* [Ht-1,Xt]+bi), out gate state vector ot=σ (Wo* [Ht-1,Xt]+bo) and lose
Forget door state vector f t=σ (Wf* [Ht-1,Xt]+bf), wherein Wi, Wo and Wf are parameter matrix respectively, and bi, bo with
And bf is offset parameter vector respectively.Then current cell state vector C is calculatedtAnd Ht, specially:Ct=ft ⊙ Ct-1+it⊙
tanh(Wc*[Ht-1, Xt] and+bc), Ht=ot ⊙ tanh (Ct), wherein Wc and bc be parameter matrix and offset parameter respectively to
Amount.In the current state vector H for calculating the t moment of LSTM modeltIn the case where, it is based on current state vector HtIt obtains and closes
The method for infusing the corresponding verbal description of window is identical as the method above in relation to general Recognition with Recurrent Neural Network model.
Above by taking current time is t as an example, describe previous based on one group of characteristic pattern fc and Recognition with Recurrent Neural Network model
State vector Ht-1, center and the size of the focus window on one group of characteristic pattern are calculated, to calculate Recognition with Recurrent Neural Network mould
The current state vector H of the current time t of typetTo obtain verbal description corresponding with focus window.Similarly, it can also calculate
The state vector of the Recognition with Recurrent Neural Network model of moment t+1, t+2 ..., to obtain respectively and concern when moment t+1, t+2 ...
The corresponding verbal description of window.
Preferably, training verbal description model can also include:For a sample image in multiple sample images,
When determining that verbal description corresponding with focus window is full stop, the training carried out based on one sample image is terminated.
As an example, for a sample image, when determining verbal description corresponding with focus window is full stop, then
Determine that the training carried out based on one sample image is terminated.
Preferably, the parameter of verbal description model may include the parameter of convolutional neural networks model, first nerves network
The parameter of model, the parameter of nervus opticus network model, the parameter of third nerve network model, the 4th neural network model
Parameter, the parameter of the parameter of fifth nerve network model and Recognition with Recurrent Neural Network model and the parameter for comparing.As
Example, training verbal description model may include being trained to the parameter of above-mentioned verbal description model.
Trained verbal description model is described by taking a sample image as an example above.It how is detailed below based on multiple
Sample image training obtains verbal description model, for convenience of describing, it is assumed that use convolutional neural networks model (CNN) from each sample
One group of characteristic pattern of this image zooming-out, and it is based on the previous state vector of one group of characteristic pattern and Recognition with Recurrent Neural Network model (RNN),
Calculate center and the size of the focus window on one group of characteristic pattern.Give n training dataWherein,
XiIndicate a secondary sample image, YiIndicate that corresponding verbal description, the process of training verbal description model are as follows.
Step 1:The parameter of verbal description model is initialized, wherein CNN is existed using VGG-16 model and with VGG-16
Trained parameter is initialized on imagenet data set, and the parameter and other parameters of initialization RNN.Data are set
The data count of concentration is batch_size=64.
Step 2:It is concentrated from training data and samples batch_size data without playback.
Step 3:Based on "current" model parameter, the probability that corresponding verbal description is generated for sample image is calculatedThe parameter of "current" model is updated using P as objective function gradient descent method.
Step 4:Repeat the above steps 1- step 3, until verbal description model is restrained.
In conclusion information processing method 100 according to an embodiment of the present disclosure can learn the concern window in image automatically
The position of mouth and size, and the content based on focus window generates corresponding verbal description.Since historical information dynamic can be based on
Ground discovery generates the image-region that current character needs to pay close attention to, therefore can generate more suitable verbal description.
With above- mentioned information processing method embodiment correspondingly, the disclosure additionally provides the implementation of following information processing unit
Example.
Fig. 3 is the block diagram for showing the functional configuration example of information processing unit 300 according to an embodiment of the present disclosure.
As shown in figure 3, information processing unit 300 according to an embodiment of the present disclosure may include extraction unit 302 and instruction
Practice unit 304.It is described below the functional configuration example of extraction unit 302 and training unit 304.
In extracting step unit 302, can extract from each sample image in multiple sample images has preset width
With one group of characteristic pattern of predetermined altitude, wherein the characteristic pattern in one group of characteristic pattern is opposite from different characteristics of image respectively
It answers.
Example in relation to characteristic pattern fc may refer to the description of corresponding position in above method embodiment, no longer heavy herein
It is multiple.
Preferably, extracting one group of characteristic pattern from each sample image may include using convolutional neural networks model from each
Sample image extracts one group of characteristic pattern.
As an example, can extract each sample image with convolutional neural networks has preset width s and predetermined height
Spend one group of (p) characteristic pattern fc of r.
It based on extracted one group of characteristic pattern and can be the text of multiple sample images label in training unit 304
It describes to train verbal description model, verbal description model can be used for generating corresponding verbal description according to input picture, wherein
Training verbal description model may include the previous state vector based on one group of characteristic pattern and Recognition with Recurrent Neural Network model, calculate one
The center of focus window on group characteristic pattern and size.
As an example, existing for the verbal description that each sample image in multiple sample images marks respectively.It can be with base
Verbal description model is trained in extracted one group of characteristic pattern and the verbal description that is marked, which can use
According to the corresponding verbal description of input picture generation.As an example, one group of characteristic pattern and Recognition with Recurrent Neural Network model can be based on
Previous state vector, calculate one group of characteristic pattern on focus window center and size.
Preferably, the center and size for calculating the focus window on one group of characteristic pattern include:One group of characteristic pattern can be answered
To sigmoid function be activation primitive first nerves network model one group of characteristic pattern is converted into a vector;It can be with
By the vector after conversion merge with the previous state vector of Recognition with Recurrent Neural Network model and to the vector after merging apply with
Sigmoid function is the nervus opticus network model of activation primitive;It can be to through the obtained vector of nervus opticus network model
Further using using tanh function as the third nerve network model of activation primitive;It can will be through obtained by third nerve network model
To vector be used for compared with parameter carry out operation;And it can be according to the result and preset width and predetermined altitude of operation
Calculate center and the size of focus window.
In relation to calculate one group of characteristic pattern on the center of focus window and the example of size may refer to above method implement
The description of corresponding position, is not repeated herein in example.
Preferably, training verbal description model can also include:Center and size based on focus window obtain one group of spy
Levy the concern feature vector of figure.As an example, can center based on the focus window obtained as described above and size, obtain one
The concern feature vector of group characteristic pattern.
Preferably, obtaining concern feature vector may include:Part corresponding with focus window on one group of characteristic pattern is answered
With fourth nerve network model the part is converted into a vector, and using one vector as concern feature to
Amount.
Example in relation to obtaining concern feature vector may refer to the description of corresponding position in above method embodiment, herein
It is not repeated.
Preferably, training verbal description model further includes:Based in concern feature vector and Recognition with Recurrent Neural Network model
Previous state vector calculates the current state vector of Recognition with Recurrent Neural Network model, and is obtained and closed based on current state vector
Infuse the corresponding verbal description of window.
Example in relation to calculating the current state vector of Recognition with Recurrent Neural Network model may refer in above method embodiment
The description of corresponding position, is not repeated herein.
Preferably, obtaining verbal description corresponding with focus window may include:To current state vector using the 5th mind
Calculate the probability of occurrence of each word in predetermined dictionary through network model, and by the maximum word of probability of occurrence be determined as with
The corresponding verbal description of focus window.
Example in relation to determination verbal description corresponding with focus window may refer to corresponding positions in above method embodiment
The description set, is not repeated herein.
Preferably, Recognition with Recurrent Neural Network model can also include LSTM model.
Example in relation to LSTM model may refer to the description of corresponding position in above method embodiment, no longer heavy herein
It is multiple.
Preferably, training verbal description model can also include:For a sample image in multiple sample images,
When determining that verbal description corresponding with focus window is full stop, the training carried out based on one sample image is terminated.
As an example, for a sample image, when determining verbal description corresponding with focus window is full stop, then
Determine that the training carried out based on one sample image is terminated.
Preferably, the parameter of verbal description model may include the parameter of convolutional neural networks model, first nerves network
The parameter of model, the parameter of nervus opticus network model, the parameter of third nerve network model, the 4th neural network model
Parameter, the parameter of the parameter of fifth nerve network model and Recognition with Recurrent Neural Network model and the parameter for comparing.As
Example, training verbal description model may include being trained to the parameter of above-mentioned verbal description model.
The example for obtaining verbal description model based on the training of multiple sample images may refer to phase in above method embodiment
The description for answering position, is not repeated herein.
In conclusion information processing unit 300 according to an embodiment of the present disclosure can learn the concern window in image automatically
The position of mouth and size, and the content based on focus window generates corresponding verbal description.Since historical information dynamic can be based on
Ground discovery generates the image-region that current character needs to pay close attention to, therefore can generate more suitable verbal description.
It is noted that although the foregoing describe the functional configuration of information processing unit according to an embodiment of the present disclosure,
This is only exemplary rather than limitation, and those skilled in the art can modify to above embodiments according to the principle of the disclosure,
Such as the functional module in each embodiment can be added, deleted or be combined, and such modification each falls within this
In scope of disclosure.
It is furthermore to be noted that Installation practice here is corresponding to the above method embodiment, therefore in device reality
Applying the content being not described in detail in example can be found in the description of corresponding position in embodiment of the method, be not repeated to describe herein.
It should be understood that the instruction that the machine in storage medium and program product according to an embodiment of the present disclosure can be performed may be used also
To be configured to execute above- mentioned information processing method, the content that therefore not described in detail here can refer to retouching for previous corresponding position
It states, is not repeated to be described herein.
Correspondingly, this is also included within for carrying the storage medium of the program product of the above-mentioned instruction that can be performed including machine
In the disclosure of invention.The storage medium includes but is not limited to floppy disk, CD, magneto-optic disk, storage card, memory stick etc..
According to another aspect of the present disclosure, a kind of information detecting method is provided, which not only considers to scheme
The position of focus window as in and the size for considering focus window.
The flow example of information detecting method 400 according to an embodiment of the present disclosure is described next, with reference to Fig. 4.Fig. 4 is
The flow chart of the flow example of information detecting method 400 according to an embodiment of the present disclosure is shown.As shown in figure 4, according to this public affairs
The information detecting method 400 for the embodiment opened includes extraction step S402 and generation step S404.
In extraction step S402, one group of feature with preset width and predetermined altitude can be extracted from input picture
Figure, wherein the characteristic pattern in one group of characteristic pattern is corresponding from different characteristics of image respectively.
It can use the prior art and extract one group of (p) spy with preset width s and predetermined altitude r from input picture
Sign figure fc, fc=CN (image), wherein image indicates the tensor of the m*n*c of image, and m, n and c respectively indicate input picture
Length, width and port number;CN () indicates transforming function transformation function;Extracted characteristic pattern fc is the tensor of a r*s*p, wherein p
Indicate the number of feature, i.e., each characteristic pattern in one group described (p) the characteristic pattern fc is respectively each of with p characteristics of image
Characteristics of image is corresponding.
Preferably, extracting one group of characteristic pattern from input picture may include using convolutional neural networks model from input picture
Extract one group of characteristic pattern.
As an example, can be extracted with convolutional neural networks input picture with preset width s and predetermined altitude r
One group of (p) characteristic pattern fc, wherein the transforming function transformation function that CN () expression is realized with convolutional neural networks.
In generation step S404, it can be based on extracted one group of characteristic pattern, utilize housebroken verbal description model
Generate the corresponding verbal description of input picture, wherein the corresponding text of input picture is generated using housebroken verbal description model
Word description may include:Previous state vector based on one group of characteristic pattern and Recognition with Recurrent Neural Network model calculates one group of characteristic pattern
On focus window center and size.
As an example, housebroken verbal description model can be used for generating corresponding verbal description according to input picture.It can
To be based on extracted one group of characteristic pattern, the corresponding verbal description of input picture is generated using housebroken verbal description model.
As an example, can the previous state vector based on one group of characteristic pattern and Recognition with Recurrent Neural Network model, calculate one group of characteristic pattern on
Focus window center and size.
As an example, using HtThe current state vector for indicating t at the time of Recognition with Recurrent Neural Network model, uses Ht-1Indicate circulation
The previous state vector of t-1 at the time of neural network model.By the initial state vector H of Recognition with Recurrent Neural Network model0Initialization
It is 0, i.e. H0=zeros (hd), wherein zeros () indicates full null function, and hd is the state vector of Recognition with Recurrent Neural Network model
Dimension.
Preferably, the center and size for calculating the focus window on one group of characteristic pattern include:One group of characteristic pattern can be answered
To sigmoid function be activation primitive first nerves network model one group of characteristic pattern is converted into a vector, as
Extracted one group of characteristic pattern can be merged into a vector by example, and by using sigmoid function as the mind of activation primitive
A full articulamentum through network (it is an example using sigmoid function as the first nerves network model of activation primitive)
Nonlinear transformation is carried out to the vector after the merging and obtains f1 (fc), so that one group of characteristic pattern is converted into a vector,
In, non-linear transform function f1 (fc)=σ (W1*fc+b1), σ () indicate sigmoid function, and W1 and b1 are respectively parameter matrix
With offset parameter vector;Vector after conversion can be merged with the previous state vector of Recognition with Recurrent Neural Network model and pairing
Vector after and is applied using sigmoid function as the nervus opticus network model of activation primitive, as an example, can will be after conversion
Vector and Recognition with Recurrent Neural Network model previous state vector Ht-1Merge to obtain vector [f1 (fc), Ht-1], then will
Obtained vector is by the way that using sigmoid function as a full articulamentum of the neural network of activation primitive, (it is with sigmoid letter
Number is an example of the nervus opticus network model of activation primitive) carry out nonlinear transformation f2 ([f1 (fc), Ht-1]), wherein
Non-linear transform function f2 ([f1 (fc), Ht-1])=σ (W2* [f1 (fc), Ht-1]+b2), W2 and b2 be respectively parameter matrix and
Offset parameter vector;It can be activation letter to further being applied through the obtained vector of nervus opticus network model with tanh function
Several third nerve network model, as an example, can pass through the vector obtained after nonlinear transformation f2 () with tanh letter
Number is that the third nerve network model of activation primitive carries out nonlinear transformation tanh (f2 ([f1 (fc), Ht-1]));It can will be through
Parameter of the obtained vector of three neural network models compared with being used for carries out operation, as an example, can will be through third nerve
The obtained vector of network model passes through σ after doing point multiplication operation with vector V (it is an example of the parameter for compared with) again
Result is normalized in function, obtains σ (tanh (f2 ([f1 (fc), Ht-1]))⊙V);And it can be according to the result of operation
And preset width and predetermined altitude calculate center and the size of focus window, as an example, can be according to the result of operation
Standardization processing is carried out to the window's position and size with preset width and predetermined altitude, and (cs ', cr ', s ', r ')=(s, r, s, r)
⊙σ(tanh(f2([f1(fc),Ht-1])) ⊙ V), wherein cs ' and cr ' respectively indicates width of the focus window on characteristic pattern fc
The center of degree and short transverse, s ' and r ' respectively indicate the width and height of focus window.
Preferably, can also include using the corresponding verbal description that housebroken verbal description model generates input picture:
Center and size based on focus window obtain the concern feature vector of one group of characteristic pattern.As an example, institute as above can be based on
The center for the focus window stated and size obtain the concern feature vector of one group of characteristic pattern.
Preferably, obtaining concern feature vector may include:Part corresponding with focus window on one group of characteristic pattern is answered
With fourth nerve network model the part is converted into a vector, and using one vector as concern feature to
Amount.
As an example it is supposed that att is the matrix-vector indicated with characteristic pattern fc same size, in the matrix-vector, with
Value at the corresponding position of focus window is 1, and the value at the position except focus window is 0.Fc ⊙ att is indicated only as a result,
Extract content of the characteristic pattern fc in focus window, that is, can be indicated with fc ⊙ att corresponding with focus window on one group of characteristic pattern
Part.Furthermore, it is possible to which a full articulamentum (it is an example of fourth nerve network model) by neural network is right
Fc ⊙ att carries out transformation Xt=f (fc ⊙ att), and by transformed vector XtAs feature vector is paid close attention to, wherein f () is
Transforming function transformation function.It can be using concern feature vector, X t as the input of t moment Recognition with Recurrent Neural Network model.
Preferably, further include using the corresponding verbal description that housebroken verbal description model generates input picture:It is based on
The previous state vector in feature vector and Recognition with Recurrent Neural Network model is paid close attention to, the current state of Recognition with Recurrent Neural Network model is calculated
Vector, and verbal description corresponding with focus window is obtained based on current state vector.
As an example, can be according to concern feature vector, XtWith the previous state of t-1 at the time of Recognition with Recurrent Neural Network model
Vector Ht-1Come calculate Recognition with Recurrent Neural Network model current time t current state vector Ht=tanh (Wh*Ht-1+Wi*Xt+
B), wherein Wh and Wi is parameter matrix, and B is offset parameter vector.It is then possible to be obtained based on current state vector and pay close attention to window
The corresponding verbal description of mouth.
Preferably, obtaining verbal description corresponding with focus window may include:To current state vector using the 5th mind
Calculate the probability of occurrence of each word in predetermined dictionary through network model, and by the maximum word of probability of occurrence be determined as with
The corresponding verbal description of focus window.
As an example, can current state vector to Recognition with Recurrent Neural Network model to apply with softmax function be activation
The neural network model (it is an example of fifth nerve network model) of function, to calculate each of predetermined dictionary
Probability of occurrence P (the Y of word Ytt)=softmax (σ (Wp*Ht+ bp)), wherein Wp and bp is parameter matrix and offset parameter respectively
Vector.Also, the maximum word of probability of occurrence is determined as verbal description corresponding with focus window.
Preferably, Recognition with Recurrent Neural Network model can also include shot and long term memory network (LSTM) model.
As an example, when initializing LSTM model, being needed in the case where Recognition with Recurrent Neural Network model is LSTM model
Initialize the state vector H of LSTM model0With cell state vector c0, that is, enable H0=zeros (hd) and c0=zeros
(hd), wherein hd is the dimension of state.
In the case where Recognition with Recurrent Neural Network model is LSTM model, position and size and the calculating of focus window are calculated
The input X of the t moment of LSTM modeltIt is identical as the description above in relation to general Recognition with Recurrent Neural Network model.
The current state vector H for calculating the t moment of LSTM model is described in detail belowtMethod.The t moment of LSTM model
Current state vector HtPrevious state vector H dependent on last momentt-1, last moment cell state vector Ct-1And
The input X at current timet.It is primarily based on previous state vector Ht-1And current input vector XtCalculate three door state vectors, that is,
Input gate state vector it=σ (Wi* [Ht-1,Xt]+bi), out gate state vector ot=σ (Wo* [Ht-1,Xt]+bo) and lose
Forget door state vector f t=σ (Wf* [Ht-1,Xt]+bf), wherein Wi, Wo and Wf are parameter matrix respectively, and bi, bo with
And bf is offset parameter vector respectively.Then current cell state vector C is calculatedtAnd Ht, specially:Ct=ft ⊙ Ct-1+it⊙
tanh(Wc*[Ht-1, Xt] and+bc), Ht=ot ⊙ tanh (Ct), wherein Wc and bc be parameter matrix and offset parameter respectively to
Amount.In the current state vector H for calculating the t moment of LSTM modeltIn the case where, it is based on current state vector HtIt obtains and closes
The method for infusing the corresponding verbal description of window is identical as the method above in relation to general Recognition with Recurrent Neural Network model.
Above by taking current time is t as an example, describe previous based on one group of characteristic pattern fc and Recognition with Recurrent Neural Network model
State vector Ht-1, center and the size of the focus window on one group of characteristic pattern are calculated, to calculate Recognition with Recurrent Neural Network mould
The current state vector H of the current time t of typetTo obtain verbal description corresponding with focus window.Similarly, it can also calculate
The state vector of the Recognition with Recurrent Neural Network model of moment t+1, t+2 ..., to obtain respectively and concern when moment t+1, t+2 ...
The corresponding verbal description of window.
Preferably, can also include using the corresponding verbal description that housebroken verbal description model generates input picture:
When determining verbal description corresponding with focus window is full stop, the corresponding verbal description for generating input picture is terminated.
As an example, then terminating generation input picture when determining verbal description corresponding with focus window is full stop
Corresponding verbal description.
Preferably, the parameter of housebroken verbal description model may include the parameter of convolutional neural networks model, first
The parameter of neural network model, the parameter of nervus opticus network model, the parameter of third nerve network model, the 4th nerve net
The parameter of network model, the parameter of the parameter of fifth nerve network model and Recognition with Recurrent Neural Network model and the ginseng for comparing
Number.The above-mentioned parameter of housebroken verbal description model can be true and the information processing method according to the embodiment of the present disclosure
It is fixed.
Fig. 5 is the exemplary figure for showing input picture and its corresponding verbal description according to the embodiment of the present disclosure.In Fig. 5 most
The image on the left side is input picture.The intermediate image of Fig. 5 schematically shows the figure related with focus window in input picture
Picture, for example, the image related with focus window in input picture respectively includes the image of " girl ", " horse to stand aside "
Image etc..Rightmost is verbal description corresponding with input picture in Fig. 5, i.e. " horse of girl and station aside ".
In conclusion information detecting method 400 according to an embodiment of the present disclosure considers the position of the focus window in image
It sets and size, and the content based on focus window generates corresponding verbal description.Due to can dynamically be found based on historical information
The image-region that current character needs to pay close attention to is generated, therefore more suitable verbal description can be generated.
With above- mentioned information detection method embodiment correspondingly, the disclosure additionally provides the implementation of following information detector
Example.
Fig. 6 is the block diagram for showing the functional configuration example of information detector 600 according to an embodiment of the present disclosure.
As shown in fig. 6, information detector 600 according to an embodiment of the present disclosure may include extraction unit 602 and life
At unit 604.It is described below the functional configuration example of extraction unit 602 and generation unit 604.
In extraction unit 602, one group of characteristic pattern with preset width and predetermined altitude can be extracted from input picture,
Wherein, the characteristic pattern in one group of characteristic pattern is corresponding from different characteristics of image respectively.
Example in relation to characteristic pattern fc may refer to the description of corresponding position in above method embodiment, no longer heavy herein
It is multiple.
Preferably, extracting one group of characteristic pattern from input picture may include using convolutional neural networks model from input picture
Extract one group of characteristic pattern.
As an example, can be extracted with convolutional neural networks input picture with preset width s and predetermined altitude r
One group of (p) characteristic pattern fc.
In generation unit 604, it can be based on extracted one group of characteristic pattern, it is raw using housebroken verbal description model
At the corresponding verbal description of input picture, wherein generate the corresponding text of input picture using housebroken verbal description model
Description may include:Previous state vector based on one group of characteristic pattern and Recognition with Recurrent Neural Network model calculates on one group of characteristic pattern
Focus window center and size.
As an example, housebroken verbal description model can be used for generating corresponding verbal description according to input picture.It can
To be based on extracted one group of characteristic pattern, the corresponding verbal description of input picture is generated using housebroken verbal description model.
As an example, can the previous state vector based on one group of characteristic pattern and Recognition with Recurrent Neural Network model, calculate one group of characteristic pattern on
Focus window center and size.
Preferably, the center and size for calculating the focus window on one group of characteristic pattern include:One group of characteristic pattern can be answered
To sigmoid function be activation primitive first nerves network model one group of characteristic pattern is converted into a vector;It can be with
By the vector after conversion merge with the previous state vector of Recognition with Recurrent Neural Network model and to the vector after merging apply with
Sigmoid function is the nervus opticus network model of activation primitive;It can be to through the obtained vector of nervus opticus network model
Further using using tanh function as the third nerve network model of activation primitive;It can will be through obtained by third nerve network model
To vector be used for compared with parameter carry out operation;And it can be according to the result and preset width and predetermined altitude of operation
Calculate center and the size of focus window.
In relation to calculate one group of characteristic pattern on the center of focus window and the example of size may refer to above method implement
The description of corresponding position, is not repeated herein in example.
Preferably, can also include using the corresponding verbal description that housebroken verbal description model generates input picture:
Center and size based on focus window obtain the concern feature vector of one group of characteristic pattern.As an example, institute as above can be based on
The center for the focus window stated and size obtain the concern feature vector of one group of characteristic pattern.
Preferably, obtaining concern feature vector may include:Part corresponding with focus window on one group of characteristic pattern is answered
With fourth nerve network model the part is converted into a vector, and using one vector as concern feature to
Amount.
Example in relation to obtaining concern feature vector may refer to the description of corresponding position in above method embodiment, herein
It is not repeated.
Preferably, further include using the corresponding verbal description that housebroken verbal description model generates input picture:It is based on
The previous state vector in feature vector and Recognition with Recurrent Neural Network model is paid close attention to, the current state of Recognition with Recurrent Neural Network model is calculated
Vector, and verbal description corresponding with focus window is obtained based on current state vector.
Example in relation to calculating the current state vector of Recognition with Recurrent Neural Network model may refer in above method embodiment
The description of corresponding position, is not repeated herein.
Preferably, obtaining verbal description corresponding with focus window may include:To current state vector using the 5th mind
Calculate the probability of occurrence of each word in predetermined dictionary through network model, and by the maximum word of probability of occurrence be determined as with
The corresponding verbal description of focus window.
Example in relation to determination verbal description corresponding with focus window may refer to corresponding positions in above method embodiment
The description set, is not repeated herein.
Preferably, Recognition with Recurrent Neural Network model can also include LSTM model.
Example in relation to LSTM model may refer to the description of corresponding position in above method embodiment, no longer heavy herein
It is multiple.
Preferably, can also include using the corresponding verbal description that housebroken verbal description model generates input picture:
When determining verbal description corresponding with focus window is full stop, the corresponding verbal description for generating input picture is terminated.
As an example, then terminating generation input picture when determining verbal description corresponding with focus window is full stop
Corresponding verbal description.
Preferably, the parameter of housebroken verbal description model may include the parameter of convolutional neural networks model, first
The parameter of neural network model, the parameter of nervus opticus network model, the parameter of third nerve network model, the 4th nerve net
The parameter of network model, the parameter of the parameter of fifth nerve network model and Recognition with Recurrent Neural Network model and the ginseng for comparing
Number.The above-mentioned parameter of housebroken verbal description model can be true and the information processing method according to the embodiment of the present disclosure
It is fixed.
In conclusion information detector 600 according to an embodiment of the present disclosure considers the position of the focus window in image
It sets and size, and the content based on focus window generates corresponding verbal description.Due to can dynamically be found based on historical information
The image-region that current character needs to pay close attention to is generated, therefore more suitable verbal description can be generated.
It is noted that although the foregoing describe the functional configuration of information detector according to an embodiment of the present disclosure,
This is only exemplary rather than limitation, and those skilled in the art can modify to above embodiments according to the principle of the disclosure,
Such as the functional module in each embodiment can be added, deleted or be combined, and such modification each falls within this
In scope of disclosure.
It is furthermore to be noted that Installation practice here is corresponding to the above method embodiment, therefore in device reality
Applying the content being not described in detail in example can be found in the description of corresponding position in embodiment of the method, be not repeated to describe herein.
It should be understood that the instruction that the machine in storage medium and program product according to an embodiment of the present disclosure can be performed may be used also
To be configured to execute above- mentioned information detection method, the content that therefore not described in detail here can refer to retouching for previous corresponding position
It states, is not repeated to be described herein.
Correspondingly, this is also included within for carrying the storage medium of the program product of the above-mentioned instruction that can be performed including machine
In the disclosure of invention.The storage medium includes but is not limited to floppy disk, CD, magneto-optic disk, storage card, memory stick etc..
In addition, it should also be noted that above-mentioned series of processes and device can also be realized by software and/or firmware.?
In the case where being realized by software and/or firmware, from storage medium or network to the computer with specialized hardware structure, such as
The installation of general purpose personal computer 600 shown in fig. 6 constitutes the program of the software, and the computer is when being equipped with various programs, energy
Enough perform various functions etc..
In Fig. 7, central processing unit (CPU) 701 is according to the program stored in read-only memory (ROM) 702 or from depositing
The program that storage part 708 is loaded into random access memory (RAM) 703 executes various processing.In RAM 703, also according to need
Store the data required when CPU 701 executes various processing etc..
CPU 701, ROM 702 and RAM 703 are connected to each other via bus 704.Input/output interface 705 is also connected to
Bus 704.
Components described below is connected to input/output interface 705:Importation 706, including keyboard, mouse etc.;Output par, c
707, including display, such as cathode-ray tube (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.;Storage section 708,
Including hard disk etc.;With communications portion 709, including network interface card such as LAN card, modem etc..Communications portion 709 via
Network such as internet executes communication process.
As needed, driver 710 is also connected to input/output interface 705.Detachable media 711 such as disk, light
Disk, magneto-optic disk, semiconductor memory etc. are installed on driver 710 as needed, so that the computer journey read out
Sequence is mounted to as needed in storage section 708.
It is such as removable from network such as internet or storage medium in the case where series of processes above-mentioned by software realization
Unload the program that the installation of medium 711 constitutes software.
It will be understood by those of skill in the art that this storage medium be not limited to it is shown in Fig. 7 be wherein stored with program,
Separately distribute with equipment to provide a user the detachable media 711 of program.The example of detachable media 711 includes disk
(including floppy disk (registered trademark)), CD (including compact disc read-only memory (CD-ROM) and digital versatile disc (DVD)), magneto-optic disk
(including mini-disk (MD) (registered trademark)) and semiconductor memory.Alternatively, storage medium can be ROM 702, storage section
Hard disk for including in 708 etc., wherein computer program stored, and user is distributed to together with the equipment comprising them.
Preferred embodiment of the present disclosure is described above by reference to attached drawing, but the disclosure is certainly not limited to above example.This
Field technical staff can obtain various changes and modifications within the scope of the appended claims, and should be understood that these changes and repair
Changing nature will fall into scope of the presently disclosed technology.
For example, can be realized in the embodiment above by the device separated including multiple functions in a unit.
As an alternative, the multiple functions of being realized in the embodiment above by multiple units can be realized by the device separated respectively.In addition, with
One of upper function can be realized by multiple units.Needless to say, such configuration includes in scope of the presently disclosed technology.
In this specification, described in flow chart the step of not only includes the place executed in temporal sequence with the sequence
Reason, and including concurrently or individually rather than the processing that must execute in temporal sequence.In addition, even in temporal sequence
In the step of processing, needless to say, the sequence can also be suitably changed.
In addition, can also be configured as follows according to the technology of the disclosure.
A kind of 1. information processing methods are attached, including:
One group of characteristic pattern with preset width and predetermined altitude is extracted from each sample image in multiple sample images,
Wherein, the characteristic pattern in one group of characteristic pattern is corresponding from different characteristics of image respectively;And
Text is trained to retouch based on extracted one group of characteristic pattern and for the verbal description of the multiple sample image label
Model is stated, the verbal description model is used to generate corresponding verbal description according to input picture, wherein the training verbal description
Model includes the previous state vector based on one group of characteristic pattern and Recognition with Recurrent Neural Network model, calculates one group of characteristic pattern
On focus window center and size.
The information processing method according to note 1 of note 2., wherein extract one group of feature from each sample image
Figure includes extracting one group of characteristic pattern from each sample image using convolutional neural networks model.
The information processing method according to note 1 of note 3., wherein calculate the focus window on one group of characteristic pattern
Center and size include:
Applying one group of characteristic pattern using sigmoid function is the first nerves network model of activation primitive with will be described
One group of characteristic pattern is converted into a vector;
Vector after conversion is merged and with the previous state vector of the Recognition with Recurrent Neural Network model to merging
Vector afterwards is applied using sigmoid function as the nervus opticus network model of activation primitive;
It further applies to through the obtained vector of nervus opticus network model using tanh function as activation primitive
Third nerve network model;
Operation will be carried out through parameter of the obtained vector of third nerve network model compared with being used for;And
The center of the focus window is calculated according to the result of the operation and the preset width and predetermined altitude
And size.
The information processing methods according to note 1 of note 4., wherein the training verbal description model further includes:Base
Center and size in the focus window obtain the concern feature vector of one group of characteristic pattern.
It is attached 5. information processing methods according to the attached note 4, wherein obtaining the concern feature vector includes:To institute
Certain applications fourth nerve network model corresponding with the focus window is stated on one group of characteristic pattern so that the part to be converted into
One vector, and using one vector as the concern feature vector.
It is attached 6. information processing methods according to the attached note 4, wherein the training verbal description model further includes:Base
The previous state vector in the concern feature vector and the Recognition with Recurrent Neural Network model calculates the circulation nerve
The current state vector of network model, and text corresponding with the focus window is obtained based on the current state vector and is retouched
It states.
The information processing method according to note 6 of note 7., wherein obtain text corresponding with the focus window and retouch
State including:To the current state vector application fifth nerve network model to calculate going out for each word in predetermined dictionary
Existing probability, and the maximum word of probability of occurrence is determined as verbal description corresponding with the focus window.
The information processing methods according to note 6 of note 8., wherein the training verbal description model further includes:It is right
A sample image in the multiple sample image is determining that verbal description corresponding with the focus window is full stop
When, terminate the training carried out based on one sample image.
The information processing method according to note 7 of note 9., wherein the parameter of the verbal description model includes described
The parameter of convolutional neural networks model, the parameter of the first nerves network model, the parameter of the nervus opticus network model,
The parameter of the third nerve network model, the parameter of the 4th neural network model, the fifth nerve network model
Parameter and the Recognition with Recurrent Neural Network model parameter and the parameter for comparing.
The information processing method according to note 1 of note 10., wherein the Recognition with Recurrent Neural Network model further includes length
Short-term memory network.
A kind of 11. information processing units are attached, including:
Extraction unit is configured to each sample image from multiple sample images and extracts with preset width and make a reservation for
One group of characteristic pattern of height, wherein the characteristic pattern in one group of characteristic pattern is corresponding from different characteristics of image respectively;And
Training unit is configured to based on extracted one group of characteristic pattern and the text marked for the multiple sample image
It describes to train verbal description model, the verbal description model is used to generate corresponding verbal description according to input picture, wherein
The training verbal description model includes the previous state vector based on one group of characteristic pattern and Recognition with Recurrent Neural Network model, meter
Calculate center and the size of the focus window on one group of characteristic pattern.
A kind of 12. information detecting methods are attached, including:
One group of characteristic pattern with preset width and predetermined altitude is extracted from input picture, wherein one group of characteristic pattern
In characteristic pattern it is corresponding from different characteristics of image respectively;And
Based on extracted one group of characteristic pattern, the corresponding of the input picture is generated using housebroken verbal description model
Verbal description, wherein include using the corresponding verbal description that housebroken verbal description model generates the input picture:It is based on
The previous state vector of one group of characteristic pattern and Recognition with Recurrent Neural Network model calculates the focus window on one group of characteristic pattern
Center and size.
The information detecting method according to note 12 of note 13., wherein extract one group of spy from the input picture
Sign figure includes extracting one group of characteristic pattern from the input picture using convolutional neural networks model.
The information detecting method according to note 12 of note 14., wherein calculate the concern window on one group of characteristic pattern
The center and size of mouth include:
Applying one group of characteristic pattern using sigmoid function is the first nerves network model of activation primitive with will be described
One group of characteristic pattern is converted into a vector;
Vector after conversion is merged and with the previous state vector of the Recognition with Recurrent Neural Network model to merging
Vector afterwards is applied using sigmoid function as the nervus opticus network model of activation primitive;
It further applies to through the obtained vector of nervus opticus network model using tanh function as activation primitive
Third nerve network model;
Operation will be carried out through parameter of the obtained vector of third nerve network model compared with being used for;And
The center of the focus window is calculated according to the result of the operation and the preset width and predetermined altitude
And size.
The information detecting method according to note 12 of note 15., wherein generated using housebroken verbal description model
The corresponding verbal description of input picture further includes:Center and size based on the focus window obtain one group of characteristic pattern
Concern feature vector.
The information detecting method according to note 15 of note 16., wherein obtaining the concern feature vector includes:It is right
Certain applications fourth nerve network model corresponding with the focus window is on one group of characteristic pattern to convert the part
At a vector, and using one vector as the concern feature vector.
The information detecting method according to note 15 of note 17., wherein generated using housebroken verbal description model
The corresponding verbal description of input picture further includes:Based on the institute in the concern feature vector and the Recognition with Recurrent Neural Network model
State previous state vector, calculate the current state vector of the Recognition with Recurrent Neural Network model, and based on the current state to
Amount obtains verbal description corresponding with the focus window.
Note 18. is according to information detecting method as stated in Note 17, wherein obtains text corresponding with the focus window
Description includes:To the current state vector application fifth nerve network model to calculate each word in predetermined dictionary
Probability of occurrence, and the maximum word of probability of occurrence is determined as verbal description corresponding with the focus window.
Note 19. is according to information detecting method as stated in Note 17, wherein is generated using housebroken verbal description model
The corresponding verbal description of input picture further includes:When determining verbal description corresponding with the focus window is full stop, eventually
Only generate the corresponding verbal description of the input picture.
The information detecting method according to note 18 of note 20., wherein the ginseng of the housebroken verbal description model
Number includes parameter, the parameter of the first nerves network model, the nervus opticus network of the convolutional neural networks model
The parameter of model, the parameter of the third nerve network model, the parameter of the 4th neural network model, the 5th mind
The parameter and the parameter for comparing of parameter and the Recognition with Recurrent Neural Network model through network model.
Claims (10)
1. a kind of information processing method, including:
One group of characteristic pattern with preset width and predetermined altitude is extracted from each sample image in multiple sample images,
In, the characteristic pattern in one group of characteristic pattern is corresponding from different characteristics of image respectively;And
Verbal description mould is trained based on extracted one group of characteristic pattern and for the verbal description of the multiple sample image label
Type, the verbal description model are used to generate corresponding verbal description according to input picture, wherein the training verbal description model
Including the previous state vector based on one group of characteristic pattern and Recognition with Recurrent Neural Network model, calculate on one group of characteristic pattern
The center of focus window and size.
2. information processing method according to claim 1, wherein extract one group of characteristic pattern packet from each sample image
It includes and extracts one group of characteristic pattern from each sample image using convolutional neural networks model.
3. information processing method according to claim 1, wherein calculate in the focus window on one group of characteristic pattern
The heart and size include:
Applying one group of characteristic pattern using sigmoid function is the first nerves network model of activation primitive by described one group
Characteristic pattern is converted into a vector;
Vector after conversion is merged with the previous state vector of the Recognition with Recurrent Neural Network model and to merging after
Vector is applied using sigmoid function as the nervus opticus network model of activation primitive;
It further applies to through the obtained vector of nervus opticus network model using tanh function as the third of activation primitive
Neural network model;
Operation will be carried out through parameter of the obtained vector of third nerve network model compared with being used for;And
The center of the focus window and big is calculated according to the result of the operation and the preset width and predetermined altitude
It is small.
4. information processing method according to claim 1, wherein the training verbal description model further includes:Based on institute
Center and the size of focus window are stated, the concern feature vector of one group of characteristic pattern is obtained.
5. information processing method according to claim 4, wherein obtaining the concern feature vector includes:To described one
Certain applications fourth nerve network model corresponding with the focus window is on group characteristic pattern to be converted into one for the part
Vector, and using one vector as the concern feature vector.
6. information processing method according to claim 4, wherein the training verbal description model further includes:Based on institute
The previous state vector in concern feature vector and the Recognition with Recurrent Neural Network model is stated, the Recognition with Recurrent Neural Network is calculated
The current state vector of model, and verbal description corresponding with the focus window is obtained based on the current state vector.
7. information processing method according to claim 6, wherein obtain verbal description packet corresponding with the focus window
It includes:To the current state vector application fifth nerve network model to which the appearance for calculating each word in predetermined dictionary is general
Rate, and the maximum word of probability of occurrence is determined as verbal description corresponding with the focus window.
8. information processing method according to claim 6, wherein the training verbal description model further includes:For institute
A sample image in multiple sample images is stated, when determining verbal description corresponding with the focus window is full stop,
Terminate the training carried out based on one sample image.
9. a kind of information processing unit, including:
Extraction unit, each sample image being configured to from multiple sample images are extracted with preset width and predetermined altitude
One group of characteristic pattern, wherein the characteristic pattern in one group of characteristic pattern is corresponding from different characteristics of image respectively;And
Training unit is configured to based on extracted one group of characteristic pattern and the verbal description marked for the multiple sample image
Train verbal description model, the verbal description model is used to generate corresponding verbal description according to input picture, wherein training
The verbal description model includes the previous state vector based on one group of characteristic pattern and Recognition with Recurrent Neural Network model, calculates institute
State center and the size of the focus window on one group of characteristic pattern.
10. a kind of information detecting method, including:
One group of characteristic pattern with preset width and predetermined altitude is extracted from input picture, wherein in one group of characteristic pattern
Characteristic pattern is corresponding from different characteristics of image respectively;And
Based on extracted one group of characteristic pattern, the corresponding text of the input picture is generated using housebroken verbal description model
Description, wherein include using the corresponding verbal description that housebroken verbal description model generates the input picture:Based on described
The previous state vector of one group of characteristic pattern and Recognition with Recurrent Neural Network model calculates in the focus window on one group of characteristic pattern
The heart and size.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710320880.4A CN108875758B (en) | 2017-05-09 | 2017-05-09 | Information processing method and device, and information detection method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710320880.4A CN108875758B (en) | 2017-05-09 | 2017-05-09 | Information processing method and device, and information detection method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108875758A true CN108875758A (en) | 2018-11-23 |
CN108875758B CN108875758B (en) | 2022-01-11 |
Family
ID=64287118
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710320880.4A Active CN108875758B (en) | 2017-05-09 | 2017-05-09 | Information processing method and device, and information detection method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108875758B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110321918A (en) * | 2019-04-28 | 2019-10-11 | 厦门大学 | The method of public opinion robot system sentiment analysis and image labeling based on microblogging |
US11745727B2 (en) * | 2018-01-08 | 2023-09-05 | STEER-Tech, LLC | Methods and systems for mapping a parking area for autonomous parking |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120033874A1 (en) * | 2010-08-05 | 2012-02-09 | Xerox Corporation | Learning weights of fonts for typed samples in handwritten keyword spotting |
CN104765728A (en) * | 2014-01-08 | 2015-07-08 | 富士通株式会社 | Method and device for training neural network and method for determining sparse feature vector |
CN105809201A (en) * | 2016-03-11 | 2016-07-27 | 中国科学院自动化研究所 | Identification method and device for autonomously extracting image meaning concepts in biologically-inspired mode |
CN105989341A (en) * | 2015-02-17 | 2016-10-05 | 富士通株式会社 | Character recognition method and device |
CN106198749A (en) * | 2015-05-08 | 2016-12-07 | 中国科学院声学研究所 | A kind of data fusion method of multiple sensor based on Metal Crack monitoring |
CN106446782A (en) * | 2016-08-29 | 2017-02-22 | 北京小米移动软件有限公司 | Image identification method and device |
CN106484139A (en) * | 2016-10-19 | 2017-03-08 | 北京新美互通科技有限公司 | Emoticon recommends method and device |
CN106599198A (en) * | 2016-12-14 | 2017-04-26 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | Image description method for multi-stage connection recurrent neural network |
-
2017
- 2017-05-09 CN CN201710320880.4A patent/CN108875758B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120033874A1 (en) * | 2010-08-05 | 2012-02-09 | Xerox Corporation | Learning weights of fonts for typed samples in handwritten keyword spotting |
CN104765728A (en) * | 2014-01-08 | 2015-07-08 | 富士通株式会社 | Method and device for training neural network and method for determining sparse feature vector |
CN105989341A (en) * | 2015-02-17 | 2016-10-05 | 富士通株式会社 | Character recognition method and device |
CN106198749A (en) * | 2015-05-08 | 2016-12-07 | 中国科学院声学研究所 | A kind of data fusion method of multiple sensor based on Metal Crack monitoring |
CN105809201A (en) * | 2016-03-11 | 2016-07-27 | 中国科学院自动化研究所 | Identification method and device for autonomously extracting image meaning concepts in biologically-inspired mode |
CN106446782A (en) * | 2016-08-29 | 2017-02-22 | 北京小米移动软件有限公司 | Image identification method and device |
CN106484139A (en) * | 2016-10-19 | 2017-03-08 | 北京新美互通科技有限公司 | Emoticon recommends method and device |
CN106599198A (en) * | 2016-12-14 | 2017-04-26 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | Image description method for multi-stage connection recurrent neural network |
Non-Patent Citations (4)
Title |
---|
ORIOL VINYALS等: "Show and tell: A neural image caption generator", 《2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 * |
R BOWDEN ET AL: "A Linguistic Feature Vector for the Visual Interpretation of Sign Language", 《COMPUTER VISION》 * |
张树业: "深度模型及其在视觉文字分析中的应用", 《中国博士学位论文全文数据库 (信息科技辑)》 * |
李月洁等: "自然场景中特定文字图像优化识别研究与仿真", 《计算机仿真》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11745727B2 (en) * | 2018-01-08 | 2023-09-05 | STEER-Tech, LLC | Methods and systems for mapping a parking area for autonomous parking |
CN110321918A (en) * | 2019-04-28 | 2019-10-11 | 厦门大学 | The method of public opinion robot system sentiment analysis and image labeling based on microblogging |
Also Published As
Publication number | Publication date |
---|---|
CN108875758B (en) | 2022-01-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109492099B (en) | Cross-domain text emotion classification method based on domain impedance self-adaption | |
CN107292333B (en) | A kind of rapid image categorization method based on deep learning | |
EP3166049B1 (en) | Systems and methods for attention-based configurable convolutional neural networks (abc-cnn) for visual question answering | |
CN108197294B (en) | Text automatic generation method based on deep learning | |
US20200334490A1 (en) | Image processing apparatus, training method and training apparatus for the same | |
US8239336B2 (en) | Data processing using restricted boltzmann machines | |
CN108228686A (en) | It is used to implement the matched method, apparatus of picture and text and electronic equipment | |
CN108062756A (en) | Image, semantic dividing method based on the full convolutional network of depth and condition random field | |
Davis et al. | Text and style conditioned GAN for generation of offline handwriting lines | |
CN107766447A (en) | It is a kind of to solve the method for video question and answer using multilayer notice network mechanism | |
CN107220220A (en) | Electronic equipment and method for text-processing | |
CN106951512A (en) | A kind of end-to-end session control method based on hybrid coding network | |
CN108229287A (en) | Image-recognizing method and device, electronic equipment and computer storage media | |
CN111832573B (en) | Image emotion classification method based on class activation mapping and visual saliency | |
CN110837830B (en) | Image character recognition method based on space-time convolutional neural network | |
CN109857871A (en) | A kind of customer relationship discovery method based on social networks magnanimity context data | |
CN112800225B (en) | Microblog comment emotion classification method and system | |
CN111191461B (en) | Remote supervision relation extraction method based on course learning | |
CN108875758A (en) | Information processing method and device and information detecting method and device | |
CN116309992A (en) | Intelligent meta-universe live person generation method, equipment and storage medium | |
CN109241869A (en) | The recognition methods of answering card score, device and terminal device | |
CN108154165A (en) | Love and marriage object matching data processing method, device, computer equipment and storage medium based on big data and deep learning | |
CN111161266A (en) | Multi-style font generation method of variational self-coding machine based on vector quantization | |
CN112785039A (en) | Test question answering score prediction method and related device | |
Liu et al. | Learning evidential cognitive diagnosis networks robust to response bias |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |