CN108959322A

CN108959322A - Information processing method and device based on text generation image

Info

Publication number: CN108959322A
Application number: CN201710379515.0A
Authority: CN
Inventors: 侯翠琴; 夏迎炬; 杨铭; 张姝; 孙俊
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2017-05-25
Filing date: 2017-05-25
Publication date: 2018-12-07
Anticipated expiration: 2037-05-25
Also published as: CN108959322B

Abstract

The invention discloses information processing method and the device based on text generation image.The described method includes: from the text feature of the relevance between the word extracted in sample text in characterization sample text；Each part of text feature is intercepted, selectively with the window of change in size to obtain each local text feature；Each local text feature and sample image corresponding with sample text based on sample text generate model come training image, wherein, it includes coder module and decoder module that image, which generates model, the corresponding image of the text that image after training generates the decoder module in model according to each local text feature of the text inputted to be iteratively generating with inputted, and each part text feature intercepts in each secondary iteration respectively.

Description

Information processing method and device based on text generation image

Technical field

The present invention relates to field of information processing, especially deep learning field, and more particularly to information processing method and Device based on text generation image.

Background technique

It is very important research contents in artificial intelligence field that image is automatically generated based on natural language description, is had non- Often it is widely applied.In this respect, deep learning method has been achieved with many progress.There are two main classes in depth learning technology Method generates image, and one kind is variation from coding method, and one kind is to generate confrontation network method.

The variation that Kingma&Welling is proposed can regard the neural network for having continuous hidden variable as from coding method. The Posterior probability distribution of coding side model approximation hidden variable, probability distribution of the decoding end model based on hidden variable and construct image. Gregor et al. proposes deep-cycle concern and writes model (Deep Recurrent Attention Write (DRAW)) generation Variation is expanded to sequence variation from coding framework from coding method by image, DRAW model.

Generate confrontation network method include one based on probability distribution generate data Maker model and one judge number According to the arbiter model for being truthful data or generation data.Gauthier proposes a kind of condition confrontation network to generate difference The image of classification.Denton et al. is that every tomographic image all trains a condition to generate confrontation net under laplacian pyramid frame Network, then the condition confrontation network under laplacian pyramid frame based on every layer is by coarse to subtly generating image.

Although the technology of existing above-mentioned generation image in the prior art, text generation image is based on there is still a need for improved Method.

Summary of the invention

It has been given below about brief overview of the invention, in order to provide about the basic of certain aspects of the invention Understand.It should be appreciated that the general introduction be not about exhaustive general introduction of the invention, it be not intended to determine key of the invention or Pith, nor is it intended to limit the scope of the present invention.Its purpose only provides certain concepts in simplified form, with this Place mat as following specific embodiment part.

The present invention provides a kind of information processing methods, comprising: from the word extracted in characterization sample text in sample text Between relevance text feature；Each part of the text feature is selectively intercepted with the window of variable dimension, To obtain each local text feature；Each local text feature based on the sample text and corresponding with the sample text Sample image carry out training image and generate model, wherein it includes coder module and decoder module that described image, which generates model, Image after training generates the decoder module in model according to each local text feature of the text inputted come iteratively Image corresponding with the text inputted is generated, and each local text feature intercepts in each secondary iteration respectively.

According to another aspect of the present invention, a kind of device based on text generation image is provided, comprising: text feature mentions Portion is taken, the text feature of the relevance between the word in characterization text is extracted；Local text feature interception portion, with variable dimension Window selectively intercepts each part of the text feature, to obtain local text feature；And image generates model, Decoder module in described image generation model is iteratively generating according to each local text feature of input text and institute State the corresponding image of input text, each part text feature intercepts in each secondary iteration respectively.

According to another aspect of the invention, a kind of device using after above-mentioned training is provided to be based on text generation figure The method of picture, comprising: the text feature of the relevance between the word in characterization text is extracted by the Text character extraction portion；By The part text feature interception portion selectively intercepts each part of the text feature with the window of variable dimension, with Obtain local text feature；And described image generates the decoder module in model according to each local text of input text Feature is iteratively generating image corresponding with the input text, and each part text feature is cut in each secondary iteration respectively It takes.

In accordance with a further aspect of the present invention, a kind of storage medium is additionally provided.The storage medium includes machine readable Program code, when executing said program code on information processing equipment, said program code sets the information processing It is standby to execute according to the above method of the present invention.

In accordance with a further aspect of the present invention, a kind of program is additionally provided.Described program includes the executable instruction of machine, when When executing described instruction on information processing equipment, described instruction executes the information processing equipment on according to the present invention State method.

By the detailed description below in conjunction with attached drawing to highly preferred embodiment of the present invention, these and other of the invention is excellent Point will be apparent from.

Detailed description of the invention

The embodiments of the present invention are read with reference to the drawings, other features and advantages of the present invention will be better understood, Attached drawing described here is intended merely to the purpose schematically illustrated to embodiments of the present invention, and not all possible reality It applies, and is not intended to be limited to the scope of the present invention.In the accompanying drawings:

Fig. 1 is the schematic diagram for showing the structure of the device based on text generation image of embodiment according to the present invention.

Fig. 2 is the Text character extraction portion shown in the device based on text generation image of embodiment according to the present invention Structure schematic diagram.

Fig. 3 is that the image shown in the device based on text generation image of embodiment according to the present invention generates model The schematic diagram of structure.

Fig. 4 is the process for showing the training managing of the device based on text generation image of embodiment according to the present invention Figure.

Fig. 5 is that the image shown in the device based on text generation image of embodiment according to the present invention generates model The flow chart of training managing.

Fig. 6 is the configuration for showing the device based on text generation image of embodiment according to the present invention under physical training condition Exemplary schematic diagram.

Fig. 7 is that the image shown in the device based on text generation image of embodiment according to the present invention generates model The schematic diagram of configuration example.

Fig. 8 is to show to utilize generating based on the device of text generation image after the training of embodiment according to the present invention The flow chart of the method for image.

Fig. 9 is the flow chart for showing the processing that decoder module in a state of use generates image.

Figure 10 is to show the device based on text generation image of embodiment according to the present invention matching in a state of use Set exemplary schematic diagram.

Figure 11 is shown for implementing the schematic block diagram with the computer of device according to the method for the embodiment of the present invention.

Specific embodiment

Embodiments of the present invention are described in detail referring now to attached drawing.It should be noted that being described below only exemplary , and it is not intended to limit the present invention.In addition, in the following description, will adopt and be indicated with like reference numerals in different attached drawings The same or similar component.Different characteristic in different embodiments described below can be bonded to each other, to form this hair Other embodiments in bright range.

With reference first to Fig. 1, it illustrates the structures of the device based on text generation image of embodiment according to the present invention Schematic diagram.As shown in Figure 1, device 100 includes that Text character extraction portion 110, local text feature interception portion 120 and image are raw At model 130.

Text character extraction portion 110 is used to extract the text feature of the relevance between the word in characterization text.Specifically, As shown in Fig. 2, Text character extraction portion 110 includes vectorization unit 111 and Text character extraction unit 112.Vectorization unit 111 utilize existing distribution presentation technology (Log bilinearity language model (LBL), C&W model, Word2vec etc.) to text This progress vectorization (not shown in figure 1), to obtain the term vector of low dimensional.Well known to Text character extraction unit 112 utilizes Forward direction Recognition with Recurrent Neural Network and backward Recognition with Recurrent Neural Network extract the relevance between the word in characterization text based on term vector Text feature.Herein, text spy can also be extracted using only forward direction Recognition with Recurrent Neural Network or backward Recognition with Recurrent Neural Network Sign.

Local text feature interception portion 120 selectively intercepts each office of text feature with the window of variable dimension Portion, to obtain local text feature.Wherein, each local text feature generates each secondary iteration behaviour of model 130 in image respectively It is intercepted in work, and in current iteration, local text feature interception portion 120 is based in the decoder module in previous iteration The output of decoder intercept local text feature.

Image generates model 130 based on each local text feature of sample text and sample graph corresponding with sample text As training.Wherein, the image after training generates the decoder module in model according to each part text of the text inputted Eigen is come the corresponding image of the text that is iteratively generating with is inputted.Image, which generates model 130, can be well known DRAW (Deep Recurrent Attentive Writer) model.

The image that Fig. 3 shows embodiment according to the present invention generates the structural schematic diagram of model 130.As shown in figure 3, figure It include decoder module 131, coder module 132 and computing module 133 as generating model 130.

Under physical training condition, the iteratively compression samples image of coder module 132, and feature is exported in each secondary iteration First distribution of amount, this feature scale levy the key message of sample image and sample text.Under physical training condition, decoder module 131 are iteratively generating output image based on each local text feature of sample text and each first distribution, and calculate Second distribution of the characteristic quantity in each secondary iteration.Here, coder module 132 and decoder module 131 pass through circulation nerve Network (RNN) is realized.Computing module 133 calculates image based on sample image, output image, the first distribution and the second distribution It generates the loss function of model and generates model to optimize image.

In the case where generating the use state of image based on text using the device 100 after training, 131 basis of decoder module The each local text feature for inputting text and each second distribution are each to be iteratively generating image corresponding with text is inputted Local text feature intercepts in each secondary iteration respectively.In a state of use, coder module 132 is not involved in operation.

As shown in figure 3, coder module 132 includes reading part 1321, encoder 1322 and building portion 1323.Reading part 1321 read the part of sample image based on the output of decoder module in previous iteration and the output of decoder, to obtain Fractional sample image.Encoder 1322 compresses fractional sample figure based on output of the encoder and decoder in previous iteration Picture.Building portion 1323 constructs the first distribution based on the output of encoder.

Decoder module 131 includes sampling unit 1311, decoder 1312, building portion 1313 and writes out portion 1314.In training Under state, the acquisition characteristics amount from the first distribution of sampling unit 1311.Decoder 1312 is based on local text feature and previous iteration In decoder output, characteristic quantity collected is decoded.Output of the building portion 1313 based on decoder in previous iteration To construct the second distribution.It writes out portion 1314 the decoder output in current iteration is written out in the corresponding region of canvas matrix. Decoder module is obtained canvas matrix and is generated output image based on final.

In a state of use, building portion 1313 constructs the second distribution based on the output of decoder in previous iteration.Sampling The acquisition characteristics amount from the second distribution of portion 1311.Decoder 1312 is defeated based on the decoder in local text feature and previous iteration Out, characteristic quantity collected is decoded.It writes out portion 1314 and the decoder output in current iteration is written out to canvas matrix Corresponding region in.

Computing module 133 includes that the first calculation part 1331, second calculates 1332 and determining section 1333.First calculation part 1331 Calculate the first-loss function about sample image and output image.Second calculation part 1332 is calculated about the first distribution and second Second loss function of distribution.Determining section 1333 determines total loss letter based on first-loss function and the second loss function Number.

In the following, describing the training managing of device 100 with reference to Fig. 4 to Fig. 7.Fig. 4 is to show embodiment according to the present invention The device based on text generation image training managing flow chart.As shown in figure 4, training managing 200 includes step S210 To step S230.

It is special from the text of the relevance between the word extracted in sample text in characterization sample text in step S210 Sign.Specifically, come to carry out vectorization to sample text first with well known distribution presentation technology, to obtain the multiple of low dimensional Term vector.Then, the text feature that the relevance between the word in characterization sample text is extracted based on term vector, herein can be with Text feature is extracted to Recognition with Recurrent Neural Network and/or backward Recognition with Recurrent Neural Network using preceding.

In a step 220, each part of text feature is selectively intercepted with the window of variable dimension, it is each to obtain A part text feature.

In step S230, each local text feature and sample image corresponding with sample text based on sample text Carry out training image and generate model, wherein it includes coder module and decoder module that image, which generates model, and the image after training is raw It is iteratively generating and is inputted according to each local text feature of the text inputted at the decoder module in model The corresponding image of text, and each local text feature intercepts in each secondary iteration respectively.

Fig. 5 shows the detailed process that image generates the training managing of model 130.As shown in figure 5, the processing of step S230 Step S231 is specifically included to step S233.

With reference to Fig. 5, in step S231, using coder module come iteratively compression samples image, and repeatedly at each time Dai Zhongcong coder module exports the first distribution of characteristic quantity, and this feature scale levies the crucial letter of sample image and sample text Breath.Specifically, the part of sample image is read based on the output of decoder module in previous iteration and the output of decoder, To obtain fractional sample image；And the output based on decoder and encoder in previous iteration, to compress fractional sample figure Picture.In addition, first in iteration is distributed the output based on encoder to construct every time.

In step S232, each local text feature and each first distribution based on sample text utilize decoder Module come calculate the characteristic quantity in each secondary iteration second be distributed and be iteratively generating output image.Specifically, it is changing every time Acquisition characteristics amount in the distribution of Dai Zhongcong first；Based on the output of decoder in local text feature and previous iteration, decoding is utilized Device is decoded characteristic quantity collected；The second distribution is constructed based on the output of decoder in previous iteration；It is changing every time The output of decoder is written out to same matrix in generation, the output as decoder module；And it is based on finally obtained matrix To generate output image.

In step S233, mould is generated to calculate image based on sample image, output image, the first distribution and the second distribution The loss function of type generates model to optimize image.Specifically, it calculates first about the between sample image and output image One loss function then calculates about the second loss function between the first distribution and the second distribution, is finally based on first-loss Function and the second loss function determine total loss function, and using the parameter of such as back propagation more new model with minimum Change loss function.

Below with reference to the configuration example in Fig. 6 and Fig. 7, carry out the raw based on text of specific explanations embodiment according to the present invention At the training managing of the device of image.Fig. 6 is the device 100 based on text generation image for showing embodiment according to the present invention The schematic diagram of configuration example under physical training condition.Fig. 7 is the schematic diagram for showing the configuration example that image generates model.In Fig. 6 In Fig. 7, image generates model and is shown as DRAW model, however image of the invention generates model and is not limited to DRAW, ability Field technique personnel can according to need and use and can be realized any other model of the invention.

In the following description, RNN is used^encIndicate the function realized in single iteration step by encoder 1322, RNN^enc? Output in t-th of iteration step is the hidden vector h of coding side_t ^enc.Similarly, RNN is used^decIndicate that decoder 1312 is individually changing Ride instead of walk the function of middle realization, and RNN^decOutput in t-th of iteration step is the hidden vector h of coding side_t ^dec.Use RNN^fIt indicates The function realized in single iteration step by the forward direction Recognition with Recurrent Neural Network in Text character extraction portion 110, RNN^fIt changes at t-th Output in riding instead of walk is vector h_t ^f.Similarly, RNN is used^bThe letter realized in single iteration step after expression to Recognition with Recurrent Neural Network Number, RNN^bOutput in t-th of iteration step is vector h_t ^b.In addition, in the following description, unless otherwise stated, b=W (a) indicate to carry out vector a linear weighted function and biasing operation to obtain vector b.Specific training process is as follows:

Process 1. initializes: the original state of initialization coding side and the Recognition with Recurrent Neural Network of decoding end initializes two-way The original state of Recognition with Recurrent Neural Network.Enable the state h of encoder₀ ^enc, the state h of decoder₀ ^dec, forward direction Recognition with Recurrent Neural Network State h₀ ^f, the state h of backward Recognition with Recurrent Neural Network_L-1 ^bFor 0 vector of corresponding dimension.Initialize canvas Matrix C₀For 0 matrix.Just Beginningization writes out portion 1313, the original state of reading part 1321 and local text feature interception portion 120.Set total iterative steps T Value.

Process 2. extracts text feature from sample text: inputting the sentence y of natural language description, and utilizes well known point Sentence y vector is turned to low-dimensional by cloth presentation technology (Log bilinearity language model (LBL), C&W model, Word2vec etc.) Term vector ey=(ey₀,ey₁,..ey_L-1), wherein L is the quantity for the word that sentence y is included.L term vector eyi is inputted Into bidirectional circulating neural network, the L 2-way state S=(h as text feature is obtained₀ ^s,h₁ ^s,…h_L-1 ^s)=(h₀ ^f h₀ ^b, h₁ ^f h₁ ^b,…,h_L-1 ^f h_L-1 ^b), wherein h_i ^f=RNN^f(h_i-1 ^f,ey_i),h_i ^b=RNN^b(h_i-1 ^b,ey_{i_r}), wherein term vector (ey_{0_r},ey_{1_r},…,ey_{L-1_r})=(ey_L-1,...,ey₁,ey₀)。

Process 3. intercepts local text feature: local text feature interception portion 120 using attention model Text_att with The attention window of variable dimension selectively intercepts each part of text feature S.Specifically, attention model is based on decoding Output h of the device in the t-1 iteration step_t-1 ^decCalculate center and size of the attention window on S:

The center P of attention window_center=L × sigmoid (h_t-1 ^dec×W_att+b_att),

The size K of attention window_width=0.5 × L × sigmoid (h_t-1 ^dec×W_wid+b_wid),

Wherein W_att、b_att、W_widAnd b_widIt is the parameter of attention model Text_att.

Then, attention model Text_att is acted on into S and obtains s_t, s_tIt is on S with P_centerCentered on, width be K_widthLocal text feature.

Process 4. reads fractional sample image: reading part 1321 reads figure using existing attention model Read_att As the part of x.Specifically, by by 2-d gaussian filters device array be applied to image x and change attention window position and Zoom obtains each topography.

The position of N × N Gaussian filter array in the picture is by specifying the centre coordinate (gX, gY) of the filter array Stride δ between adjacent filter is positioned.Stride δ controls " zoom " of attention window, and in other words, stride δ is bigger, from The region for the topography being truncated in original image is bigger, but the resolution ratio of image is lower.In filter array, it is located at the The position μ for the filter that i row, jth arrangeⁱ _X, μ^j _YIt can indicate are as follows:

Other than the parameter gX, gY and δ noted above, it is also necessary to which other two attention parameters determines Gaussian filter Operation, that is, the precision σ of Gaussian filter²；And increase the scalar intensity γ of filter response.The input picture of given A × B X passes through the output h of decoder in each iteration step^decLinear transformation dynamically determine five parameters:

Parameter noted above is given, then the horizontal filtering matrix F of filter array_XWith vertical filtering matrix F_Y(respectively N × A peacekeeping N × B dimension) it is defined as follows:

Wherein (i, j) is the point in attention window, and the variation range of i and j is from 0 to N-1；And (a, b) be Point in input picture, the variation range of a, b are [0, A-1] and [0, B-1] respectively and Zx, Zy are to meet ∑_aFX [i, a]= 1 and ∑_bThe generalized constant of FY [j, b]=1.

It gives by h_t-1 ^decDetermining F_X, F_YWith intensity γ and input picture x and error imageWhereinAnd σ indicates logic sigmoid functionThen reading part returns to root According to the series connection of input picture and two N N matrixes of error image:

Here, identical electric-wave filter matrix is applied to input picture and error image simultaneously.

5. compression samples image of process: in t-th of iteration step by h_t-1 ^dec、x_tWithIt is input to encoder 1322, obtains shape State Wherein W_enc1、W_enc2And b_encIt is the parameter of encoder.

First distribution of the building of process 6.: the output h based on encoder_t ^encTo construct about characteristic quantity z_tFirst distribution Q (Z_t|z₁,…,z_t-1,x,y).Here, the first distribution Q, which is obeyed, has mean μ represented by following equation_tAnd variances sigma_tGauss Distribution

First distribution Q is not limited to above-mentioned Gaussian Profile, and those skilled in the art can select suitably according to actual needs Other distributions.

Process 7. is from the first profile samples characteristic quantity: sampling unit 1311 is to the first distribution Q (Z_t|z₁,…,z_t-1, x, y) and it carries out Sampling, to obtain characteristic quantity z_t。

Process 8. is decoded characteristic quantity: by z_tAnd s_tDecoder 1312 is input to obtain decoder 1312 at t-th The state h of iteration step_t ^dec。

The output of decoder is written out to canvas matrix by process 9.: being write out portion 1313 and is utilized attention model Write_att By decoder t-th of iteration step output h_t ^decIt is written out to canvas Matrix C.Specifically, attention is calculated similar to process 4 Five parameters (gX, gY, σ, δ, γ) of power model Write_att:

Wherein W (h_t ^dec)=sigmoid (h_t ^dec×W_write+b_write).Also, the filtering matrix F of Gaussian filter_xAnd F_y It is respectively as follows:

Then, attention model Write_att is acted on into h_t ^decObtain square Write_t:

Second distribution of the building of process 10.: it is based on h_t ^decThe second distribution of building P (Z_t|z₁…z_t-1), the second distribution P obeys equal Value is μ '_tIt is σ ' with variance_tGaussian Profile N (Zt | μ '_t,σ’_t), in which:

μ’_t=W (h_t ^dec),

σ’_t=exp (W (h_t ^dec))。

Process 11. updates canvas matrix: updating canvas Matrix C_t=C_t-1+Write_t, wherein C be and input picture size one The matrix of sample.

12. iterative operation of process: process 3 is repeated to process 11, until meeting maximum number of iterations T.

Process 13. calculates loss function, with the parameter of back propagation updating device 100 to minimize loss function: This loss function used are as follows:

Wherein ,-logP (x | y, z₁,…,z_T) indicating image reconstruction loss, it can be understood as generated image and input are schemed The similarity of picture；AndIndicate constructed the first distribution Q with The loss of second distribution P.

The foregoing describe the training managings of the device 100 based on text generation image.For Fig. 8 to Figure 10 The bright device 100 using after training generates the configuration of the method and device 100 of image in a state of use.

Fig. 8, which is shown, generates image based on text using the device 100 after the training of embodiment according to the present invention The flow chart of method.As shown in figure 8, method 300 includes step S310 to step S330.In step s310, by text feature Text feature of the extraction unit from the relevance between the word extracted in sample text in characterization sample text.In step s 320, Each part of text feature is selectively intercepted with the window of variable dimension by local text feature interception portion, it is each to obtain A part text feature.In step S330, decoder module is according to each local text feature of input text come iteratively It generates and is intercepted in each secondary iteration respectively with the corresponding image of input text, each local text feature.

The operation of step S310 and step S320 are identical as the operation of step S210 and step S220 in Fig. 4, for letter Just for the sake of, details are not described herein.Hereinafter, specifically describing the processing of step S330 with reference to Fig. 9.

As shown in figure 9, the processing S330 that decoder module generates image includes step S331 to step S335.In step In S331, the second distribution P is constructed based on the output of decoder in previous iteration by building portion 1313 in each iteration.In step In rapid S332, acquisition characteristics amount in P is distributed from second by sampling unit 1311.In step S333, based on local text feature and Decoder output in previous iteration, is decoded characteristic quantity collected by decoder 1312.In step S334, every In secondary iteration, the output of decoder is written out to same matrix by writing out portion 1314.In step S334, by decoder module 131 generate output image based on finally obtained matrix.

Figure 10 is the schematic diagram for showing the configuration example of the device 100 of embodiment according to the present invention in a state of use. Because not needing coder module in a state of use to construct the first distribution Q, so that the coding for having neither part nor lot in operation is omitted in Figure 10 Device module 132.

The detailed process of image is generated in the following, being explained in detail with reference to Figure 10 using the device 100 after training.

Process 1. initializes: the original state of initialization decoding end Recognition with Recurrent Neural Network, initialization bidirectional circulating nerve net The original state of network.Enable the state h of decoder₀ ^dec, the state h of forward direction Recognition with Recurrent Neural Network₀ ^f, the shape of backward Recognition with Recurrent Neural Network State h_L-1 ^bWith canvas Matrix C₀For 0 vector or matrix of corresponding dimension.Portion 1313 and local text feature interception portion are write out in initialization 120 original state.The value of total iterative steps T is set, the T value under T value and physical training condition preferably in a state of use It is identical.

Process 2. extracts text feature: the text feature of inputted text y is extracted using well known distribution presentation technology S。

Process 3: it extracts local text feature: the attention in local text feature interception portion 120 is calculated based on ht-1dec The center of the attention window of model Text_att and size, and attention model Text_att is acted on into S to obtain office Portion text feature st.

Second distribution of the building of process 4.: it is based on h_t-1 ^decTo construct about characteristic quantity z_tSecond distribution P (Zt | z₁,…, z_t-1)。

Process 5. is from the second profile samples characteristic quantity: sampling unit 1311 from the second distribution P (Zt | z₁,…,z_t-1) sampling Characteristic quantity z_t。

Process 6: characteristic quantity is decoded: by z_tAnd s_tIt is input to decoder 1312, obtains the state h of t step_t ^dec。

The output of decoder is written out to canvas matrix by process 7.: being based on h_t ^decCalculate the attention mould write out in portion 1313 The parameter (gX, gY, σ, δ, γ) of type Write_att, and attention model Write_att is acted on into h_t ^decObtain matrix Write_t。

Process 8. updates canvas matrix: updating canvas Matrix C_t=C_t-1+Write_t。

9. iterative operation of process: process 3 is repeated to process 8, until meeting maximum number of iterations T.

Process 10. generates image: being based on Matrix C_TGenerate output image x ', x '=sigmoid (C_T)。

In addition, here it is still necessary to, it is noted that in above system each building block can by software, firmware, hardware or The mode of a combination thereof is configured.It configures workable specific means or mode is well known to those skilled in the art, herein not It repeats again.In the case where being realized by software or firmware, from storage medium or network to the calculating with specialized hardware structure Machine (such as general purpose computer 1100 shown in Figure 11) installation constitutes the program of the software, which is being equipped with various programs When, it is able to carry out various functions etc..

Figure 11, which is shown, can be used for implementing the schematic block diagram with the computer of system according to the method for the embodiment of the present invention.

In Figure 11, central processing unit (CPU) 1101 according to the program stored in read-only memory (ROM) 1102 or from The program that storage section 1108 is loaded into random access memory (RAM) 1103 executes various processing.In RAM 1103, root is gone back The data required when CPU 1101 executes various processing etc. are stored according to needs.CPU 1101, ROM 1102 and RAM 1103 It is connected to each other via bus 1104.Input/output interface 1105 is also connected to bus 1104.

Components described below is connected to input/output interface 1105: importation 1106 (including keyboard, mouse etc.), output Part 1107 (including display, such as cathode-ray tube (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.), storage unit Divide 1108 (including hard disks etc.), communications portion 1109 (including network interface card such as LAN card, modem etc.).Communication unit 1109 are divided to execute communication process via network such as internet.As needed, driver 1110 can be connected to input/output and connect Mouth 1105.Detachable media 1111 such as disk, CD, magneto-optic disk, semiconductor memory etc. can according to need mounted On driver 1110, so that the computer program read out is mounted to as needed in storage section 1108.

It is such as removable from network such as internet or storage medium in the case where series of processes above-mentioned by software realization Unload the program that the installation of medium 1111 constitutes software.

It will be understood by those of skill in the art that this storage medium is not limited to wherein be stored with journey shown in Figure 11 Sequence is separately distributed with equipment to provide a user the detachable media 1111 of program.The example packet of detachable media 1111 Containing disk (include floppy disk (registered trademark)), CD (including compact disc read-only memory (CD-ROM) and digital versatile disc (DVD)), Magneto-optic disk (including mini-disk (MD) (registered trademark)) and semiconductor memory.Alternatively, storage medium can be ROM 1102, deposit The hard disk etc. for including in part 1108 is stored up, wherein computer program stored, and is distributed to user together with the equipment comprising them.

The present invention also proposes a kind of program product of instruction code for being stored with machine-readable.Described instruction code is by machine When device reads and executes, method that above-mentioned embodiment according to the present invention can be performed.

Correspondingly, it is also wrapped for carrying the storage medium of the program product of the above-mentioned instruction code for being stored with machine-readable It includes within the scope of the invention.The storage medium includes but is not limited to floppy disk, CD, magneto-optic disk, storage card, memory stick etc. Deng.

It should be noted that method of the invention be not limited to specifications described in time sequencing execute, can also be by According to other order of order, concurrently or independently it executes.Therefore, the execution sequence of method described in this specification is not right Technical scope of the invention is construed as limiting.

It is above for a better understanding of the present invention, to be only exemplary to the description of each embodiment of the present invention, And it is not intended to limit the invention.It should be noted that in the above description, describing and/or showing for a kind of embodiment Feature can be used in one or more other embodiments in a manner of same or similar, in other embodiment Feature is combined, or the feature in substitution other embodiment.It will be understood by those skilled in the art that of the invention not departing from In the case where inventive concept, for the variations and modifications that embodiment described above carries out, belong to of the invention In range.

To sum up, in an embodiment according to the present invention, the present invention provides following technical solutions.

A kind of information processing method of scheme 1., comprising:

From the text feature of the relevance between the word extracted in sample text in characterization sample text；

Each part of the text feature is intercepted, selectively with the window of change in size to obtain each part text Eigen；

It is instructed based on each local text feature of the sample text and sample image corresponding with the sample text Practice image and generate model,

Wherein, it includes coder module and decoder module that described image, which generates model, and the image after training generates model In the decoder module text pair that is iteratively generating Yu is inputted according to each local text feature of the text inputted The image answered, and each local text feature intercepts in each secondary iteration respectively.

The information processing method according to scheme 1 of scheme 2., wherein the text feature packet is extracted from sample text It includes:

Vectorization is carried out to the sample text, to obtain multiple term vectors of low dimensional；And

The text feature for characterizing the relevance between the word in the sample text is extracted based on the term vector.

The information processing method according to scheme 2 of scheme 3., wherein to Recognition with Recurrent Neural Network and/or backward before Recognition with Recurrent Neural Network extracts the text feature.

The information processing method according to scheme 1 of scheme 4., wherein training described image generates model and includes:

The sample image is iteratively compressed using the coder module, and from the coding in each secondary iteration Device module exports the first distribution of characteristic quantity, and the characteristic quantity characterizes the crucial letter of the sample image and the sample text Breath；

Each local text feature and each first distribution using the decoder module based on the sample text come It is iteratively generating output image, and constructs using the decoder module second point of the characteristic quantity in each secondary iteration Cloth；And

The figure is calculated based on the sample image, the output image, first distribution and second distribution Model is generated as generating the loss function of model to optimize described image.

The information processing method according to scheme 4 of scheme 5., wherein in each iteration, based in previous iteration The output of decoder in decoder module intercepts the local text feature.

The information processing method according to scheme 5 of scheme 6., wherein in each iteration, using coder module come Iteratively compressing the sample image includes:

The sample is read based on the output of the decoder module described in previous iteration and the output of the decoder The part of image, to obtain fractional sample image；And

Based on the output of encoder and the decoder in previous iteration in the coder module, the encoder Compress the fractional sample image.

The information processing method according to scheme 6 of scheme 7., wherein the first distribution in each iteration is based on the volume The output of code device constructs.

The information processing method according to scheme 7 of scheme 8., wherein be iteratively generating using the decoder module Image includes:

The characteristic quantity is acquired from first distribution in each iteration；

Based on the output of decoder in local text feature and previous iteration, using the decoder to feature collected Amount is decoded；

Second distribution is constructed based on the output of decoder in previous iteration；

The output of decoder is written out to same matrix in each iteration, the output as the decoder module；With And

The output image is generated based on finally obtained matrix.

The information processing method according to scheme 8 of scheme 9., wherein calculate the loss function that described image generates model Include:

Calculate the first-loss function about the sample image and the output image；

Calculate the second loss function about first distribution and second distribution；And

The loss function is determined based on the first-loss function and second loss function.

The information processing method according to any one of scheme 6-9 of scheme 10., wherein the encoder and the solution Code device is realized using Recognition with Recurrent Neural Network.

The information processing method according to scheme 10 of scheme 11., wherein it is DRAW nerve net that described image, which generates model, Network.

A kind of device based on text generation image of scheme 12., comprising:

Text character extraction portion extracts the text feature of the relevance between the word in characterization text；

Local text feature interception portion selectively intercepts each office of the text feature with the window of change in size Portion, to obtain local text feature；And

Image generates model, and described image generates the decoder module in model according to each local text of input text Feature is iteratively generating image corresponding with the input text, and each part text feature is cut in each secondary iteration respectively It takes.

13. device according to scheme 12, wherein the Text character extraction portion includes:

Vectorization unit carries out vectorization to the sample text, to obtain multiple term vectors of low dimensional；And

Text character extraction unit extracts the association characterized between the word in the sample text based on the term vector The text feature of property.

14. device according to scheme 13, wherein the Text character extraction portion is using preceding to Recognition with Recurrent Neural Network And/or backward Recognition with Recurrent Neural Network extracts the text feature.

15. device according to scheme 12, wherein described image generates model and includes:

Coder module iteratively compresses the sample image, and first point of characteristic quantity is exported in each secondary iteration Cloth, the characteristic quantity characterize the key message of the sample image and the sample text；

Decoder module, each local text feature and each first distribution based on the sample text, calculates each time Second of the characteristic quantity in iteration is distributed and is iteratively generating output image；And

Computing module, by the sample image, the output image, first distribution and second distribution come based on It calculates described image and generates the loss function of model to optimize described image generation model.

16. device according to scheme 15, wherein the coder module includes:

Reading part, the output based on the decoder module described in previous iteration and the decoder in the decoder module Output read the part of the sample image, to obtain fractional sample image；

Encoder compresses the fractional sample figure based on the output of encoder and the decoder in previous iteration Picture；And

Building portion constructs the first distribution based on the output of the encoder in each iteration.

17. device according to scheme 16, wherein the decoder module includes:

Sampling unit acquires the characteristic quantity from first distribution in each iteration；

Lsb decoder, based on the decoder output in the local text feature and previous iteration, to characteristic quantity collected It is decoded；

Building portion constructs second distribution based on the output of decoder in previous iteration；And

Portion is write out, the output of decoder is written out to same matrix in each iteration, as the decoder module Output,

Wherein, the decoder module is obtained matrix and is generated the output image based on final.

18. according to device described in scheme 17, wherein the computing module includes:

First calculation part calculates the first-loss function about the sample image and the output image；

Second calculation part calculates the second loss function about first distribution and second distribution；And

Determining section determines the loss function based on the first-loss function and second loss function.

19, the device according to any one of scheme 12-18, wherein it is DRAW nerve net that described image, which generates model, Network.

20. a kind of utilize the method based on text generation image according to the device after the training of scheme 12-19, comprising:

The text feature of the relevance between the word in characterization text is extracted by the Text character extraction portion；

The text feature is intercepted selectively with the window of change in size by the local text feature interception portion Each part, to obtain local text feature；And

Described image generates the decoder module in model according to each local text feature of input text come iteratively Generate corresponding with input text image, each local text feature intercepts in each secondary iteration respectively.

Claims

1. a kind of information processing method, comprising:

Each part of the text feature is selectively intercepted with the window of change in size, it is special to obtain each local text Sign；

Figure is trained based on each local text feature of the sample text and sample image corresponding with the sample text As generating model,

Wherein, it includes coder module and decoder module that described image, which generates model, and the image after training generates in model Decoder module is corresponding come the text for being iteratively generating with being inputted according to each local text feature of the text inputted Image, and each local text feature intercepts in each secondary iteration respectively.

2. information processing method according to claim 1, wherein extracting the text feature from sample text includes:

3. information processing method according to claim 1, wherein training described image generates model and includes:

The sample image is iteratively compressed using the coder module, and from the encoder mould in each secondary iteration Block exports the first distribution of characteristic quantity, and the characteristic quantity characterizes the key message of the sample image and the sample text；

Each local text feature and each first distribution using the decoder module based on the sample text is come iteration Ground generates output image, and constructs using the decoder module the second distribution of the characteristic quantity in each secondary iteration； And

It is raw to calculate described image based on the sample image, the output image, first distribution and second distribution Model is generated at the loss function of model to optimize described image.

4. information processing method according to claim 3, wherein in each iteration, based on the decoding in previous iteration The output of decoder in device module intercepts the local text feature.

5. information processing method according to claim 4, wherein in each iteration, using coder module come iteration Compress the sample image in ground

The sample image is read based on the output of the decoder module described in previous iteration and the output of the decoder Part, to obtain fractional sample image；And

Based on the output of encoder and the decoder in previous iteration in the coder module, the encoder compresses The fractional sample image.

6. information processing method according to claim 5, wherein the first distribution in each iteration is based on the encoder Output construct.

7. information processing method according to claim 6, wherein be iteratively generating image using the decoder module Include:

Based on the output of decoder in local text feature and previous iteration, using the decoder to characteristic quantity collected into Row decoding；

The output of decoder is written out to same matrix in each iteration, the output as the decoder module；And

The output image is generated based on finally obtained matrix.

8. information processing method according to claim 7, wherein calculate the loss function packet that described image generates model It includes:

9. the information processing method according to any one of claim 5-8, wherein the encoder and decoder benefit It is realized with Recognition with Recurrent Neural Network.

10. a kind of device based on text generation image, comprising:

Local text feature interception portion selectively intercepts each part of the text feature with the window of change in size, To obtain local text feature；And

Image generates model, and described image generates the decoder module in model according to each local text feature of input text It is iteratively generating image corresponding with the input text, each part text feature intercepts in each secondary iteration respectively.