CN110021051A - One kind passing through text Conrad object image generation method based on confrontation network is generated - Google Patents

One kind passing through text Conrad object image generation method based on confrontation network is generated Download PDF

Info

Publication number
CN110021051A
CN110021051A CN201910257463.9A CN201910257463A CN110021051A CN 110021051 A CN110021051 A CN 110021051A CN 201910257463 A CN201910257463 A CN 201910257463A CN 110021051 A CN110021051 A CN 110021051A
Authority
CN
China
Prior art keywords
text
personage
posture
picture
confrontation network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910257463.9A
Other languages
Chinese (zh)
Other versions
CN110021051B (en
Inventor
周星然
黄思羽
李斌
李英明
张仲非
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201910257463.9A priority Critical patent/CN110021051B/en
Publication of CN110021051A publication Critical patent/CN110021051A/en
Application granted granted Critical
Publication of CN110021051B publication Critical patent/CN110021051B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Machine Translation (AREA)
  • Character Discrimination (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses one kind based on confrontation network is generated by text Conrad object image generation method, belongs to computer vision field.Specifically comprise the following steps: to obtain the character image data set for training, and defines algorithm target;The posture information for obtaining all images in character image data set, obtains basic poses from all posture informations by clustering algorithm;Utilize the study carried out based on the attitude prediction device for generating confrontation network from text to prediction posture;Using the acquistion of the middle school S2~S3 to attitude prediction device predict to obtain corresponding personage's posture from text;Using the study for the personage's picture generation for carrying out meeting text description based on the personage's picture generator for generating confrontation network, while the mapping relations between picture subregion and text are established using multi-modal error.It is of the invention based on generating confrontation network by text Conrad object image generation method, identified again etc. in scenes in picture generations, picture editor, pedestrian, with good application value.

Description

One kind passing through text Conrad object image generation method based on confrontation network is generated
Technical field
The invention belongs to computer vision fields, particularly a kind of to fight what network was instructed by text based on generation Character image generation method.
Background technique
The generation of text Conrad's object image is defined as following problem: according to the description of target text, changing simultaneously ginseng It examines the posture of personage and attribute (such as clothes color) in picture and reaches consistent with text description.In recent years, it was regarded in computer Feel task is as particular picture generation, image retrieval, personage identify that specified content can be generated in generation method in field again Picture expands data set, increases the important function of algorithm robustness.Mainly there are two key points for the task: how first be The targeted attitude of personage is predicted from text, targeted attitude should describe to be consistent with text, and the guidance as the transformation of personage's posture. Second is how to change simultaneously the posture and attribute of personage in reference picture, and the posture of personage changes simultaneously in the picture of generation And meet the attribute of verbal description.For first point, it is considered herein that personage's posture contains posture direction and posture movement two A factor, posture direction determine movement towards angle, posture movement be human limbs variation.For second point, this hair It is bright be embedded in a network attention up-sampling module, effectively integrated when generating personage's picture multiple mode (text, posture, Picture) data, it is ensured that personage is completed at the same time posture changing and attribute modification.Some previous methods are considered to personage The problem of posture changes, also some methods generate for text-image, and less method considers to change people according to the description of text Object posture and attribute.
Due to the validity of statistical modeling, gradually it is applied to currently based on the method for study in the task of picture generation. The existing method based on study mainly using confrontation network frame is generated, inputs a width character image and target text, output Meet the character image of text description.
Summary of the invention
To solve the above problems, fighting what network was instructed by text based on generation the purpose of the present invention is to provide a kind of Character image generation method.During passing through text prediction personage posture, since text itself does not include specific space Corresponding informance, we first pass through clustering method and obtain the basic poses with different directions, by text to specific basic Posture carries out the adjustment in part and details, obtains the personage's posture for meeting text description.It is also required to consider from text simultaneously Effectively extract key message, it is related to personage's posture about the information of direction and movement in text, and describe attribute information and It is related to generate personage's perceptual property performance in picture.In addition, network is considered from multiple during generating personage's picture The data (text, posture, image) of mode, for the fusion and expression of multiple modal characteristics, we introduce and adopt in attention Egf block.Using relevant information in attention mechanism concern text, while also completing the variation of personage's posture.In summary three A aspect, we devise one based on the learning framework for generating confrontation network, make model foundation picture subregion and text it Between connection, to carry out the feature representation of different postures, attribute personage's picture.The generation of picture is controlled to user by text Providing convenience property and friendly.
To achieve the above object, the technical solution of the present invention is as follows:
Based on generating confrontation network by text Conrad object image generation method, it the following steps are included:
S1, it obtains for trained character image data set, and defines algorithm target;
S2, the posture information for obtaining all images in character image data set, by clustering algorithm from all posture informations Middle acquisition basic poses;
S3, the study carried out based on the posture generator for generating confrontation network from target text to prediction posture is utilized;
S4, using the acquistion of the middle school S2~S3 to posture generator predict to obtain corresponding personage's posture from text;
S5, it is generated using the personage's picture for meet text description based on the personage's picture generator for generating confrontation network Study, while establishing the mapping relations between picture subregion and text using multi-modal error.
S6, the personage's picture generator learnt using S5 input the description text of reference picture and Target Photo, raw At the personage's picture for meeting text description.
Based on above scheme, each step can be accomplished in that
In step S1, the character image data set includes several personage's pictures, and each personage's picture is labelled with needle Text description to personage in the picture, the algorithm target of definition are as follows: for each of training set personage, exist with reference to figure Piece x, Target Photo x ', the posture p of the personage and description text t of Target Photo in Target Photo;Input reference picture x and The description text t of Target Photo, it is desirable that from the posture and movement of description text t prediction target, generate similar to Target Photo x ' Picture
Further, in step S2, the posture information of all images in character image data set is obtained, clustering algorithm is passed through Basic poses are obtained from all posture informations, specifically include following sub-step:
S21, personage's posture that all pictures in data set are obtained by attitude detection algorithm;
S22, personage's posture is clustered by K-means clustering algorithm, and calculates the average posture of ith clusterAnd as basic poses, K basic poses are acquired altogether
Further, it in step S3, carries out using based on the posture generator for generating confrontation network from target text to pre- The study for surveying posture, specifically includes following sub-step:
S31, using a LSTM network, extract the feature representation vector of goal description text tBy connecting mind entirely Through network ForiPredict the direction o of posture described by text, i.e.,Wherein o ∈ { 1 ..., K }, The consistent basic poses of direction o obtained with prediction are selected from K basic poses
S32, a generator G is used1Study is based on text informationTo adjust basic posesGenerate one in advance Survey postureI.e.In learning process, the calculating of softmax function and true directions are utilized to direction o Between error, calculateMean square error between posture true value p, calculates simultaneouslyConfrontation error, by three kinds of errors It is used as supervision message together.
Further, in step S4, using the acquistion of the middle school S2~S3 to posture generator predict to obtain phase from text Personage's posture is answered to specifically include following sub-step:
Based on the personage's posture generator established by S2~S3, the description text t of Target Photo is inputted, is predicted from text Personage's posture direction, and basic poses are adjusted according to text, it generates the personage that one meets text description and predicts posture
Further, it in step S5, is retouched using carrying out meeting text based on the personage's picture generator for generating confrontation network The study that the personage's picture stated generates, while establishing the mapping relations between picture subregion and text using multi-modal error and having Body includes following sub-step:
S51, feature extraction, the depth being chosen in different sizes are carried out to personage's reference picture x using convolutional neural networks Spend feature (v1, v2..., vm), viFor the picture depth feature in i-th of size, wherein i=1,2 ..., m, m are down-sampling Sum;
S52, posture is predicted to personage obtained in step S4 using convolutional neural networksFeature extraction is carried out, is chosen at Depth characteristic (s in different sizes1, s2..., sm), siFor the posture depth characteristic in i-th of size, wherein i=1, 2 ..., m, m are the sum of down-sampling;
S53, text feature matrix e, e are extracted by all hidden state vector h using a two-way LSTMjSplicing group At i.e. e=(h1, h2..., hN), wherein j=1,2 ..., N, N are word quantity in text;
Vision text attention c on i-th S54, calculating of sizei=viSoftmax(vi TE), pass through multiple scale vision Text distanceCome the distance between subregion and the text t for measuring picture x, establish between picture subregion and text Relationship:
Wherein cijFor vision text attention ciJth column, ejIt is h for the jth column of text feature matrix ej, r () It is the cosine similarity between two vectors;
S55, each training pair is calculatedMultiple scale vision text distance matrix Λ, I be each training batch The sum of secondary middle training pair, xiAnd tiThe reference picture of respectively i-th trained centering and the description text of Target Photo;Λ's I-th row jth column element beThe matched posterior probability of picture and text is P (ti|xi)= Softmax(Λ)(i, i), the posterior probability of text and picture match is P (xi|ti)=Softmax (ΛT)(i, i);It is multi-modal similar Property errorIt calculates are as follows:
S56, attention up-sampling operation is carried out when generating personage's picture: first calculating the word vision in i-th of size Attention zi=eSoftmax (eTvi), the up-sampling in i-th of size isWhereinFor Closest up-sampling operation in i-th of size, ui-1It is the up-sampling in previous size as a result, as i=1
The up-sampling operation of multiple attention is cascaded, personage's picture is generatedLearnt by fighting error;It learns During habit, multi-modal similitude error is calculatedGenerate personage's pictureConfrontation error and Target Photo x ' with L1 error, regard three kinds of errors as supervision message together.
Of the invention passes through text Conrad object image generation method based on generation confrontation network, compared to existing people Object image generation method, has the advantages that
Firstly, the present invention considers the generation for describing control personage's picture by text, i.e., both controlled by text description The attitudes vibration of personage, also has modified the clothes color attribute of personage.By seeking the control of text description, for a user It is more friendly and conveniently.
Secondly, one can be predicted from text description the invention proposes the method for passing through text prediction personage posture A reasonable personage's posture for meeting direction in text description, movement.
Finally, module is up-sampled the invention proposes attention, data of the module effective integration from different modalities, Including text, posture and image.At the same time, which can retain identity of personage information in reference picture, to make to give birth to At personage's picture it is more natural and true.
Of the invention passes through text Conrad object image generation method based on generation confrontation network, in picture generation, figure Piece editor, pedestrian identify etc. in scenes there is good application value again.For example, being retouched in picture editor's scene according to text One can be generated with reference picture by, which stating, communicates with personage in reference picture but posture and clothes color attribute change Picture, obtain the picture of different postures and attribute by the keyword in modification text description, such approach carrys out user It says more friendly and conveniently.Generating such picture has fundamental role for others work, because obtaining data set Itself be it is expensive, in some cases even be difficult to obtain, these personage's pictures be can be generated out by this application, be conducive to Development to other related works.
Detailed description of the invention
Fig. 1 is flow diagram of the invention;
Fig. 2 is the flow diagram in embodiment;
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that described herein, specific examples are only used to explain the present invention, not For limiting the present invention.
On the contrary, the present invention covers any substitution done on the essence and scope of the present invention being defined by the claims, repairs Change, equivalent method and scheme.Further, in order to make the public have a better understanding the present invention, below to of the invention thin It is detailed to describe some specific detail sections in section description.Part without these details for a person skilled in the art The present invention can also be understood completely in description.
With reference to Fig. 1, in the preferred embodiment, one kind passing through text Conrad object based on confrontation network is generated Image generating method, comprising the following steps:
S1, it obtains for trained character image data set, and defines algorithm target.Its specific sub-step is as follows: personage Image data set includes several personage's pictures, and each personage's picture is labelled with the text description for personage in the picture, fixed The algorithm target of justice are as follows: for each of training set personage, there are reference picture x, Target Photo x ', people in Target Photo The posture p of the object and description text t of Target Photo;Input the description text t of reference picture x and Target Photo, it is desirable that from retouching The posture and movement for stating text t prediction target, generate and the similar picture of Target Photo x '
S2, the posture information for obtaining all images in character image data set, by clustering algorithm from all posture informations Middle acquisition basic poses.Its specific sub-step is as follows: the posture information of all images in character image data set is obtained, by poly- Class algorithm obtains basic poses from all posture informations, specifically includes following sub-step:
S21, personage's posture that all pictures in data set are obtained by attitude detection algorithm;
S22, personage's posture is clustered by K-means clustering algorithm, and calculates the average posture of ith clusterAnd as basic poses, K basic poses are acquired altogether
S3, the study carried out based on the posture generator for generating confrontation network from target text to prediction posture is utilized.Its Specific sub-step is as follows:
S31, using a LSTM network, extract the feature representation vector of goal description text tBy connecting mind entirely Through network ForiPredict the direction o of posture described by text, i.e.,Wherein o ∈ { 1 ..., K }, The consistent basic poses of direction o obtained with prediction are selected from K basic poses
S32, a generator G is used1Study is based on text informationTo adjust basic posesGenerate one in advance Survey postureI.e.In learning process, the calculating of softmax function and true directions are utilized to direction o Between error, calculateMean square error between posture true value p, calculates simultaneouslyConfrontation error, by three kinds of errors It is used as supervision message together.
S4, using the acquistion of the middle school S2~S3 to posture generator predict to obtain corresponding personage's posture from text.It has Body sub-step is as follows:
Based on the personage's posture generator established by S2~S3, the description text t of Target Photo is inputted, is predicted from text Personage's posture direction, and basic poses are adjusted according to text, it generates the personage that one meets text description and predicts posture
S5, it is generated using the personage's picture for meet text description based on the personage's picture generator for generating confrontation network Study, while establishing the mapping relations between picture subregion and text using multi-modal error.Its specific sub-step is as follows:
S51, feature extraction, the depth being chosen in different sizes are carried out to personage's reference picture x using convolutional neural networks Spend feature (v1, v2..., vm), viFor the picture depth feature in i-th of size, wherein i=1,2 ..., m, m are down-sampling Sum;
S52, posture is predicted to personage obtained in step S4 using convolutional neural networksFeature extraction is carried out, is chosen at Depth characteristic (s in different sizes1, s2..., sm), siFor the posture depth characteristic in i-th of size, wherein i=1, 2 ..., m, m are the sum of down-sampling;
S53, text feature matrix e, e are extracted by all hidden state vector h using a two-way LSTMjSplicing group At i.e. e=(h1, h2..., hN), wherein j=1,2 ..., N, N are word quantity in text;
Vision text attention c on i-th S54, calculating of sizei=viSoftmax(vi TE), pass through multiple scale vision text Character-spacing fromCome the distance between subregion and the text t for measuring picture x, establish between picture subregion and text Relationship:
Wherein cijFor vision text attention ciJth column, ejIt is h for the jth column of text feature matrix ej, r () It is the cosine similarity between two vectors;
S55, each training pair is calculatedMultiple scale vision text distance matrix Λ, I be each training batch The sum of secondary middle training pair, xiAnd tiThe reference picture of respectively i-th trained centering and the description text of Target Photo;Λ's I-th row jth column element beThe matched posterior probability of picture and text is P (ti|xi)= Softmax(Λ)(i, i), the posterior probability of text and picture match is P (xi|ti)=Softmax (ΛT)(i, i);It is multi-modal similar Property errorIt calculates are as follows:
S56, attention up-sampling operation is carried out when generating personage's picture: first calculating the word vision in i-th of size Attention zi=eSoftmax (eTvi), the up-sampling in i-th of size isWhereinFor Closest up-sampling operation in i-th of size, ui-1It is the up-sampling in previous size as a result, as i=1
The up-sampling operation of multiple attention is cascaded, personage's picture is generatedLearnt by fighting error;It learns During habit, multi-modal similitude error is calculatedGenerate personage's pictureConfrontation error and Target Photo x ' with L1 error, regard three kinds of errors as supervision message together.
S6, the personage's picture generator learnt using S5 input the description text of reference picture and Target Photo, i.e., Produce the personage's picture for meeting text description.
The above method is applied in specific embodiment below, so as to those skilled in the art can better understand that this hair Bright effect.
Embodiment
Learn to have obtained personage's posture generator according to the step of aforementioned S1~S5 in the present embodiment and personage's picture generates Device, the implementation method of each step show its effect only for case data below as previously mentioned, no longer elaborate specific step Fruit.The present embodiment is implemented in the CUHK-PEDES data set with text marking, and image sources identify data in 5 pedestrians again Collection is respectively CUHK03, Market-1501, SSM, VIPER and CUHK01, altogether includes 40206 pictures of 13003 personages.
The present embodiment is tested on CUHK-PEDES data set.
The main flow that personage's picture generates is as follows:
1) the personage's posture being consistent is predicted from description text by personage's posture generator;
2) change the keyword that color attribute is described in description text, as shown in Figure 2;
3) personage's posture of prediction, modified description text and reference picture are inputted into personage's picture generator, obtained Personage's picture that personage's posture and attribute change;
It 4) is the validity that this method is comprehensively compared, we compare other more advanced methods and suitably modified similar Character image generate frame to adapt to the targeted task of this method;
5) structural similarity (SSIM) of the present embodiment and Inception score (Inception score) are shown in Table 1, Middle PT, which refers to, only changes posture, and P&AT, which refers to, not only to be changed posture but also change color attribute, is furthermore directed in the task the present embodiment and proposes VQA perceives score (VQA perceptual score) and measures the correctness that color attribute changes.Data are shown in figure, the present invention In structural similarity, Inception score and VQA perceive the performance in three indexs of score, with phase after other methods and modification As compared according to the character image generation method of text description control under frame, have further promotion on the whole.Its The calculation method of middle VQA perception score are as follows:Description is changed by program at random first Color attribute (considering 10 kinds of colors altogether) in text generates corresponding picture, and the color attribute of change is registered as correctly answering Case, then program inquires one, VQA model the problem of being relevant to character physical part (color of clothes or trousers), finally receives Collect the problem of VQA model returns answer and accuracy in computations, wherein T is to return to the correct picture number of answer, and N is that picture is total Number.
1 the present embodiment of table SSIM and IS index on CUHK-PEDES data set
Method SSIM(PT) IS(PT) IS(P&AT)
SIS[1] 0.239±0.106 3.707±0.185 3.790±0.182
AttnGAN[2] 0.298±0.126 3.695±0.110 3.726±0.123
PG2[3] 0.237±0.120 3.473±0.009 3.486±0.125
Single AU 0.305±0.121 4.015±0.009 4.071±0.149
ours 0.364±0.123 4.209±0.165 4.218±0.195
2 the present embodiment of table VQA on CUHK-PEDES data set perceives Scoring Guidelines
Method VQA perceptual score
Real image 0.698
SIS[1] 0.275
AttnGAN[2] 0.139
PG2[3] 0.110
Single AU 0.205
ours 0.334
Wherein ours is the method for the present embodiment, and cascades 3 up-sampling operations in S56;Single AU refers to In S56, the operation of 3 up-samplings is not used to be cascaded, is changed to only use an attention up-sampling operation, remaining way with Ours is consistent;Real image refers to that data concentrate original image to pass through the result that VQA model is putd question to and answered in table 2.Remaining method pair The bibliography answered is as follows:
[1]H.Dong,S.Yu,C.Wu,and Y.Guo.Semantic image synthesis via adversarial learning.In ICCV,2017.
[2]T.Xu,P.Zhang,Q.Huang,H.Zhang,Z.Gan,X.Huang,and X.He.Attngan:Fine- grained text to image generation with attentional generative adversarial networks.In CVPR,2018.
[3]L.Ma,J.Xu,Q.Sun,B.Schiele,T.Tuytelaars,and L.Van Gool.Pose guided person image generation.In NIPS,2017.
By above technical scheme, the present invention implements to provide based on depth learning technology a kind of based on generation confrontation network Pass through text Conrad's object image generation method.True and animated characters' image can be generated in the present invention, by describing text Carry out the control of the posture of personage and attribute in generation character image.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Made any modifications, equivalent replacements, and improvements etc., should all be included in the protection scope of the present invention within mind and principle.

Claims (6)

1. one kind passes through text Conrad object image generation method based on confrontation network is generated, which is characterized in that including following Step:
S1, it obtains for trained character image data set, and defines algorithm target;
S2, the posture information for obtaining all images in character image data set, are obtained from all posture informations by clustering algorithm Take basic poses;
S3, the study carried out based on the posture generator for generating confrontation network from target text to prediction posture is utilized;
S4, using the acquistion of the middle school S2~S3 to posture generator predict to obtain corresponding personage's posture from text;
S5, the personage's picture generation for carrying out meeting text description based on the personage's picture generator for generating confrontation network is utilized It practises, while establishing the mapping relations between picture subregion and text using multi-modal error.
S6, the personage's picture generator learnt using S5 input the description text of reference picture and Target Photo, generate symbol Close personage's picture of text description.
2. as described in claim 1 pass through text Conrad object image generation method, feature based on generation confrontation network It is, in step S1, the character image data set includes several personage's pictures, and each personage's picture is labelled with to be directed to and be somebody's turn to do The text description of personage, the algorithm target of definition in picture are as follows: for each of training set personage, there are reference picture x, Target Photo x ', the posture p of the personage and description text t of Target Photo in Target Photo;Input reference picture x and target The description text t of picture, it is desirable that from the posture and movement of description text t prediction target, generate and the similar figure of Target Photo x ' Piece
3. as claimed in claim 2 pass through text Conrad object image generation method, feature based on generation confrontation network It is, in step S2, obtains the posture information of all images in character image data set, believed by clustering algorithm from all postures Basic poses are obtained in breath, specifically include following sub-step:
S21, personage's posture that all pictures in data set are obtained by attitude detection algorithm;
S22, personage's posture is clustered by K-means clustering algorithm, and calculates the average posture of ith cluster And as basic poses, K basic poses are acquired altogether
4. as claimed in claim 3 pass through text Conrad object image generation method, feature based on generation confrontation network It is, in step S3, carries out using based on the posture generator for generating confrontation network from target text to the study for predicting posture, Specifically include following sub-step:
S31, using a LSTM network, extract the feature representation vector of goal description text tBy connecting nerve net entirely Network ForiPredict the direction o of posture described by text, i.e.,Wherein o ∈ { 1 ..., K }, from K The consistent basic poses of direction o obtained with prediction are selected in a basic poses
S32, a generator G is used1Study is based on text informationTo adjust basic posesGenerate a prediction appearance StateI.e.In learning process, direction o is calculated between true directions using softmax function Error, calculateMean square error between posture true value p, calculates simultaneouslyConfrontation error, three kinds of errors are made together For supervision message.
5. as claimed in claim 4 pass through text Conrad object image generation method, feature based on generation confrontation network Be, in step S4, using the acquistion of the middle school S2~S3 to posture generator predict to obtain corresponding personage's posture from text and have Body includes following sub-step:
Based on the personage's posture generator established by S2~S3, the description text t of Target Photo is inputted, personage is predicted from text Posture direction, and basic poses are adjusted according to text, it generates the personage that one meets text description and predicts posture
6. as claimed in claim 5 pass through text Conrad object image generation method, feature based on generation confrontation network It is, in step S5, utilizes the personage's picture for carrying out meeting text description based on the personage's picture generator for generating confrontation network The study of generation, while following son is specifically included using the mapping relations that multi-modal error is established between picture subregion and text Step:
S51, feature extraction is carried out to personage's reference picture x using convolutional neural networks, the depth being chosen in different sizes is special Levy (v1, v2..., vm), viFor the picture depth feature in i-th of size, wherein i=1,2 ..., m, m are the total of down-sampling Number;
S52, posture is predicted to personage obtained in step S4 using convolutional neural networksFeature extraction is carried out, difference is chosen at Depth characteristic (s in size1, s2..., sm), siFor the posture depth characteristic in i-th of size, wherein i=1,2 ..., m, M is the sum of down-sampling;
S53, text feature matrix e, e are extracted by all hidden state vector h using a two-way LSTMjSplicing composition, i.e. e =(h1, h2..., hN), wherein j=1,2 ..., N, N are word quantity in text;
Vision text attention c on i-th S54, calculating of sizei=viSoftmax(vi TE), by multiple scale vision text away from FromCome the distance between subregion and the text t for measuring picture x, the relationship between picture subregion and text is established:
Wherein cijFor vision text attention ciJth column, ejIt is h for the jth column of text feature matrix ej, r () is two Cosine similarity between a vector;
S55, each training pair is calculatedMultiple scale vision text distance matrix Λ, I is in each trained batch The sum of training pair, xiAnd tiThe reference picture of respectively i-th trained centering and the description text of Target Photo;The i-th row of Λ Jth column element beThe matched posterior probability of picture and text is P (ti|xi)=Softmax (Λ)(i, i), the posterior probability of text and picture match is P (xi|ti)=Softmax (ΛT)(i, i);Multi-modal similitude errorIt calculates are as follows:
S56, attention up-sampling operation is carried out when generating personage's picture: first calculating the word vision attention in i-th of size zi=eSoftmax (eTvi), the up-sampling in i-th of size isWhereinFor i-th of size On closest up-sampling operation, ui-1It is the up-sampling in previous size as a result, as i=1
The up-sampling operation of multiple attention is cascaded, personage's picture is generatedLearnt by fighting error;Learnt Cheng Zhong calculates multi-modal similitude errorGenerate personage's pictureConfrontation error and Target Photo x ' withL1 Error regard three kinds of errors as supervision message together.
CN201910257463.9A 2019-04-01 2019-04-01 Human image generation method based on generation of confrontation network through text guidance Active CN110021051B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910257463.9A CN110021051B (en) 2019-04-01 2019-04-01 Human image generation method based on generation of confrontation network through text guidance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910257463.9A CN110021051B (en) 2019-04-01 2019-04-01 Human image generation method based on generation of confrontation network through text guidance

Publications (2)

Publication Number Publication Date
CN110021051A true CN110021051A (en) 2019-07-16
CN110021051B CN110021051B (en) 2020-12-15

Family

ID=67190349

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910257463.9A Active CN110021051B (en) 2019-04-01 2019-04-01 Human image generation method based on generation of confrontation network through text guidance

Country Status (1)

Country Link
CN (1) CN110021051B (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110427864A (en) * 2019-07-29 2019-11-08 腾讯科技(深圳)有限公司 A kind of image processing method, device and electronic equipment
CN110555458A (en) * 2019-07-24 2019-12-10 中北大学 Multi-band image feature level fusion method for generating countermeasure network based on attention mechanism
CN110705306A (en) * 2019-08-29 2020-01-17 首都师范大学 Evaluation method for consistency of written and written texts
CN111046166A (en) * 2019-12-10 2020-04-21 中山大学 Semi-implicit multi-modal recommendation method based on similarity correction
CN111062865A (en) * 2020-03-18 2020-04-24 腾讯科技(深圳)有限公司 Image processing method, image processing device, computer equipment and storage medium
CN111091059A (en) * 2019-11-19 2020-05-01 佛山市南海区广工大数控装备协同创新研究院 Data equalization method in household garbage plastic bottle classification
CN111369468A (en) * 2020-03-09 2020-07-03 北京字节跳动网络技术有限公司 Image processing method, image processing device, electronic equipment and computer readable medium
CN111402365A (en) * 2020-03-17 2020-07-10 湖南大学 Method for generating picture from characters based on bidirectional architecture confrontation generation network
CN111476241A (en) * 2020-03-04 2020-07-31 上海交通大学 Character clothing conversion method and system
CN111583213A (en) * 2020-04-29 2020-08-25 西安交通大学 Image generation method based on deep learning and no-reference quality evaluation
CN111667547A (en) * 2020-06-09 2020-09-15 创新奇智(北京)科技有限公司 GAN network training method, clothing picture generation method, device and electronic equipment
CN111898456A (en) * 2020-07-06 2020-11-06 贵州大学 Text modification picture network model training method based on multi-level attention mechanism
CN111950346A (en) * 2020-06-28 2020-11-17 中国电子科技网络信息安全有限公司 Pedestrian detection data expansion method based on generation type countermeasure network
CN112001279A (en) * 2020-08-12 2020-11-27 山东省人工智能研究院 Cross-modal pedestrian re-identification method based on dual attribute information
CN112784677A (en) * 2020-12-04 2021-05-11 上海芯翌智能科技有限公司 Model training method and device, storage medium and computing equipment
CN112966760A (en) * 2021-03-15 2021-06-15 清华大学 Neural network fusing text and image data and design method of building structure thereof
CN113205574A (en) * 2021-04-30 2021-08-03 武汉大学 Art character style migration system based on attention system
CN113222875A (en) * 2021-06-01 2021-08-06 浙江大学 Image harmonious synthesis method based on color constancy
CN113919998A (en) * 2021-10-14 2022-01-11 天翼数字生活科技有限公司 Image anonymization method based on semantic and attitude map guidance
CN114119811A (en) * 2022-01-28 2022-03-01 北京智谱华章科技有限公司 Image generation method and device and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180374249A1 (en) * 2017-06-27 2018-12-27 Mad Street Den, Inc. Synthesizing Images of Clothing on Models
CN109215007A (en) * 2018-09-21 2019-01-15 维沃移动通信有限公司 A kind of image generating method and terminal device
CN109523616A (en) * 2018-12-04 2019-03-26 科大讯飞股份有限公司 A kind of FA Facial Animation generation method, device, equipment and readable storage medium storing program for executing

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180374249A1 (en) * 2017-06-27 2018-12-27 Mad Street Den, Inc. Synthesizing Images of Clothing on Models
CN109215007A (en) * 2018-09-21 2019-01-15 维沃移动通信有限公司 A kind of image generating method and terminal device
CN109523616A (en) * 2018-12-04 2019-03-26 科大讯飞股份有限公司 A kind of FA Facial Animation generation method, device, equipment and readable storage medium storing program for executing

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LIQIAN MA等: "Disentangled Person Image Generation", 《RESEARCHGATE》 *
何佩林等: "基于生成对抗文本的人脸图像翻译", 《计算机技术与自动化》 *

Cited By (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110555458A (en) * 2019-07-24 2019-12-10 中北大学 Multi-band image feature level fusion method for generating countermeasure network based on attention mechanism
CN110555458B (en) * 2019-07-24 2022-04-19 中北大学 Multi-band image feature level fusion method for generating countermeasure network based on attention mechanism
CN110427864A (en) * 2019-07-29 2019-11-08 腾讯科技(深圳)有限公司 A kind of image processing method, device and electronic equipment
CN110705306B (en) * 2019-08-29 2020-08-18 首都师范大学 Evaluation method for consistency of written and written texts
CN110705306A (en) * 2019-08-29 2020-01-17 首都师范大学 Evaluation method for consistency of written and written texts
CN111091059A (en) * 2019-11-19 2020-05-01 佛山市南海区广工大数控装备协同创新研究院 Data equalization method in household garbage plastic bottle classification
CN111046166A (en) * 2019-12-10 2020-04-21 中山大学 Semi-implicit multi-modal recommendation method based on similarity correction
CN111476241B (en) * 2020-03-04 2023-04-21 上海交通大学 Character clothing conversion method and system
CN111476241A (en) * 2020-03-04 2020-07-31 上海交通大学 Character clothing conversion method and system
CN111369468A (en) * 2020-03-09 2020-07-03 北京字节跳动网络技术有限公司 Image processing method, image processing device, electronic equipment and computer readable medium
CN111369468B (en) * 2020-03-09 2022-02-01 北京字节跳动网络技术有限公司 Image processing method, image processing device, electronic equipment and computer readable medium
CN111402365B (en) * 2020-03-17 2023-02-10 湖南大学 Method for generating picture from characters based on bidirectional architecture confrontation generation network
CN111402365A (en) * 2020-03-17 2020-07-10 湖南大学 Method for generating picture from characters based on bidirectional architecture confrontation generation network
CN111062865A (en) * 2020-03-18 2020-04-24 腾讯科技(深圳)有限公司 Image processing method, image processing device, computer equipment and storage medium
CN111583213A (en) * 2020-04-29 2020-08-25 西安交通大学 Image generation method based on deep learning and no-reference quality evaluation
CN111583213B (en) * 2020-04-29 2022-06-07 西安交通大学 Image generation method based on deep learning and no-reference quality evaluation
CN111667547B (en) * 2020-06-09 2023-08-11 创新奇智(北京)科技有限公司 GAN network training method, garment picture generation method and device and electronic equipment
CN111667547A (en) * 2020-06-09 2020-09-15 创新奇智(北京)科技有限公司 GAN network training method, clothing picture generation method, device and electronic equipment
CN111950346A (en) * 2020-06-28 2020-11-17 中国电子科技网络信息安全有限公司 Pedestrian detection data expansion method based on generation type countermeasure network
CN111898456A (en) * 2020-07-06 2020-11-06 贵州大学 Text modification picture network model training method based on multi-level attention mechanism
CN111898456B (en) * 2020-07-06 2022-08-09 贵州大学 Text modification picture network model training method based on multi-level attention mechanism
CN112001279A (en) * 2020-08-12 2020-11-27 山东省人工智能研究院 Cross-modal pedestrian re-identification method based on dual attribute information
CN112001279B (en) * 2020-08-12 2022-02-01 山东省人工智能研究院 Cross-modal pedestrian re-identification method based on dual attribute information
CN112784677A (en) * 2020-12-04 2021-05-11 上海芯翌智能科技有限公司 Model training method and device, storage medium and computing equipment
CN112966760A (en) * 2021-03-15 2021-06-15 清华大学 Neural network fusing text and image data and design method of building structure thereof
CN112966760B (en) * 2021-03-15 2021-11-09 清华大学 Neural network fusing text and image data and design method of building structure thereof
CN113205574A (en) * 2021-04-30 2021-08-03 武汉大学 Art character style migration system based on attention system
CN113222875A (en) * 2021-06-01 2021-08-06 浙江大学 Image harmonious synthesis method based on color constancy
CN113919998A (en) * 2021-10-14 2022-01-11 天翼数字生活科技有限公司 Image anonymization method based on semantic and attitude map guidance
CN113919998B (en) * 2021-10-14 2024-05-14 天翼数字生活科技有限公司 Picture anonymizing method based on semantic and gesture graph guidance
CN114119811B (en) * 2022-01-28 2022-04-01 北京智谱华章科技有限公司 Image generation method and device and electronic equipment
CN114119811A (en) * 2022-01-28 2022-03-01 北京智谱华章科技有限公司 Image generation method and device and electronic equipment

Also Published As

Publication number Publication date
CN110021051B (en) 2020-12-15

Similar Documents

Publication Publication Date Title
CN110021051A (en) One kind passing through text Conrad object image generation method based on confrontation network is generated
CN108875807B (en) Image description method based on multiple attention and multiple scales
US20200250226A1 (en) Similar face retrieval method, device and storage medium
CN109359559B (en) Pedestrian re-identification method based on dynamic shielding sample
CN109447115A (en) Zero sample classification method of fine granularity based on multilayer semanteme supervised attention model
CN108416065A (en) Image based on level neural network-sentence description generates system and method
CN111709409A (en) Face living body detection method, device, equipment and medium
JP2017091525A (en) System and method for attention-based configurable convolutional neural network (abc-cnn) for visual question answering
CN104142995B (en) The social event recognition methods of view-based access control model attribute
CN110472688A (en) The method and device of iamge description, the training method of image description model and device
CN106326857A (en) Gender identification method and gender identification device based on face image
US11966829B2 (en) Convolutional artificial neural network based recognition system in which registration, search, and reproduction of image and video are divided between and performed by mobile device and server
CN107992890B (en) A kind of multi-angle of view classifier and design method based on local feature
CN107480688A (en) Fine granularity image-recognizing method based on zero sample learning
CN112949622A (en) Bimodal character classification method and device fusing text and image
CN113761153A (en) Question and answer processing method and device based on picture, readable medium and electronic equipment
CN106897671A (en) A kind of micro- expression recognition method encoded based on light stream and FisherVector
CN111881716A (en) Pedestrian re-identification method based on multi-view-angle generation countermeasure network
CN111582342A (en) Image identification method, device, equipment and readable storage medium
CN108154156A (en) Image Ensemble classifier method and device based on neural topic model
CN111507184B (en) Human body posture detection method based on parallel cavity convolution and body structure constraint
CN105718898A (en) Face age estimation method and system based on sparse undirected probabilistic graphical model
CN117033609B (en) Text visual question-answering method, device, computer equipment and storage medium
Zhang Innovation of English teaching model based on machine learning neural network and image super resolution
Feng Mask RCNN-based single shot multibox detector for gesture recognition in physical education

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant