CN110021051A - One kind passing through text Conrad object image generation method based on confrontation network is generated - Google Patents
One kind passing through text Conrad object image generation method based on confrontation network is generated Download PDFInfo
- Publication number
- CN110021051A CN110021051A CN201910257463.9A CN201910257463A CN110021051A CN 110021051 A CN110021051 A CN 110021051A CN 201910257463 A CN201910257463 A CN 201910257463A CN 110021051 A CN110021051 A CN 110021051A
- Authority
- CN
- China
- Prior art keywords
- text
- personage
- posture
- picture
- confrontation network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Machine Translation (AREA)
- Character Discrimination (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses one kind based on confrontation network is generated by text Conrad object image generation method, belongs to computer vision field.Specifically comprise the following steps: to obtain the character image data set for training, and defines algorithm target;The posture information for obtaining all images in character image data set, obtains basic poses from all posture informations by clustering algorithm;Utilize the study carried out based on the attitude prediction device for generating confrontation network from text to prediction posture;Using the acquistion of the middle school S2~S3 to attitude prediction device predict to obtain corresponding personage's posture from text;Using the study for the personage's picture generation for carrying out meeting text description based on the personage's picture generator for generating confrontation network, while the mapping relations between picture subregion and text are established using multi-modal error.It is of the invention based on generating confrontation network by text Conrad object image generation method, identified again etc. in scenes in picture generations, picture editor, pedestrian, with good application value.
Description
Technical field
The invention belongs to computer vision fields, particularly a kind of to fight what network was instructed by text based on generation
Character image generation method.
Background technique
The generation of text Conrad's object image is defined as following problem: according to the description of target text, changing simultaneously ginseng
It examines the posture of personage and attribute (such as clothes color) in picture and reaches consistent with text description.In recent years, it was regarded in computer
Feel task is as particular picture generation, image retrieval, personage identify that specified content can be generated in generation method in field again
Picture expands data set, increases the important function of algorithm robustness.Mainly there are two key points for the task: how first be
The targeted attitude of personage is predicted from text, targeted attitude should describe to be consistent with text, and the guidance as the transformation of personage's posture.
Second is how to change simultaneously the posture and attribute of personage in reference picture, and the posture of personage changes simultaneously in the picture of generation
And meet the attribute of verbal description.For first point, it is considered herein that personage's posture contains posture direction and posture movement two
A factor, posture direction determine movement towards angle, posture movement be human limbs variation.For second point, this hair
It is bright be embedded in a network attention up-sampling module, effectively integrated when generating personage's picture multiple mode (text, posture,
Picture) data, it is ensured that personage is completed at the same time posture changing and attribute modification.Some previous methods are considered to personage
The problem of posture changes, also some methods generate for text-image, and less method considers to change people according to the description of text
Object posture and attribute.
Due to the validity of statistical modeling, gradually it is applied to currently based on the method for study in the task of picture generation.
The existing method based on study mainly using confrontation network frame is generated, inputs a width character image and target text, output
Meet the character image of text description.
Summary of the invention
To solve the above problems, fighting what network was instructed by text based on generation the purpose of the present invention is to provide a kind of
Character image generation method.During passing through text prediction personage posture, since text itself does not include specific space
Corresponding informance, we first pass through clustering method and obtain the basic poses with different directions, by text to specific basic
Posture carries out the adjustment in part and details, obtains the personage's posture for meeting text description.It is also required to consider from text simultaneously
Effectively extract key message, it is related to personage's posture about the information of direction and movement in text, and describe attribute information and
It is related to generate personage's perceptual property performance in picture.In addition, network is considered from multiple during generating personage's picture
The data (text, posture, image) of mode, for the fusion and expression of multiple modal characteristics, we introduce and adopt in attention
Egf block.Using relevant information in attention mechanism concern text, while also completing the variation of personage's posture.In summary three
A aspect, we devise one based on the learning framework for generating confrontation network, make model foundation picture subregion and text it
Between connection, to carry out the feature representation of different postures, attribute personage's picture.The generation of picture is controlled to user by text
Providing convenience property and friendly.
To achieve the above object, the technical solution of the present invention is as follows:
Based on generating confrontation network by text Conrad object image generation method, it the following steps are included:
S1, it obtains for trained character image data set, and defines algorithm target;
S2, the posture information for obtaining all images in character image data set, by clustering algorithm from all posture informations
Middle acquisition basic poses;
S3, the study carried out based on the posture generator for generating confrontation network from target text to prediction posture is utilized;
S4, using the acquistion of the middle school S2~S3 to posture generator predict to obtain corresponding personage's posture from text;
S5, it is generated using the personage's picture for meet text description based on the personage's picture generator for generating confrontation network
Study, while establishing the mapping relations between picture subregion and text using multi-modal error.
S6, the personage's picture generator learnt using S5 input the description text of reference picture and Target Photo, raw
At the personage's picture for meeting text description.
Based on above scheme, each step can be accomplished in that
In step S1, the character image data set includes several personage's pictures, and each personage's picture is labelled with needle
Text description to personage in the picture, the algorithm target of definition are as follows: for each of training set personage, exist with reference to figure
Piece x, Target Photo x ', the posture p of the personage and description text t of Target Photo in Target Photo;Input reference picture x and
The description text t of Target Photo, it is desirable that from the posture and movement of description text t prediction target, generate similar to Target Photo x '
Picture
Further, in step S2, the posture information of all images in character image data set is obtained, clustering algorithm is passed through
Basic poses are obtained from all posture informations, specifically include following sub-step:
S21, personage's posture that all pictures in data set are obtained by attitude detection algorithm;
S22, personage's posture is clustered by K-means clustering algorithm, and calculates the average posture of ith clusterAnd as basic poses, K basic poses are acquired altogether
Further, it in step S3, carries out using based on the posture generator for generating confrontation network from target text to pre-
The study for surveying posture, specifically includes following sub-step:
S31, using a LSTM network, extract the feature representation vector of goal description text tBy connecting mind entirely
Through network ForiPredict the direction o of posture described by text, i.e.,Wherein o ∈ { 1 ..., K },
The consistent basic poses of direction o obtained with prediction are selected from K basic poses
S32, a generator G is used1Study is based on text informationTo adjust basic posesGenerate one in advance
Survey postureI.e.In learning process, the calculating of softmax function and true directions are utilized to direction o
Between error, calculateMean square error between posture true value p, calculates simultaneouslyConfrontation error, by three kinds of errors
It is used as supervision message together.
Further, in step S4, using the acquistion of the middle school S2~S3 to posture generator predict to obtain phase from text
Personage's posture is answered to specifically include following sub-step:
Based on the personage's posture generator established by S2~S3, the description text t of Target Photo is inputted, is predicted from text
Personage's posture direction, and basic poses are adjusted according to text, it generates the personage that one meets text description and predicts posture
Further, it in step S5, is retouched using carrying out meeting text based on the personage's picture generator for generating confrontation network
The study that the personage's picture stated generates, while establishing the mapping relations between picture subregion and text using multi-modal error and having
Body includes following sub-step:
S51, feature extraction, the depth being chosen in different sizes are carried out to personage's reference picture x using convolutional neural networks
Spend feature (v1, v2..., vm), viFor the picture depth feature in i-th of size, wherein i=1,2 ..., m, m are down-sampling
Sum;
S52, posture is predicted to personage obtained in step S4 using convolutional neural networksFeature extraction is carried out, is chosen at
Depth characteristic (s in different sizes1, s2..., sm), siFor the posture depth characteristic in i-th of size, wherein i=1,
2 ..., m, m are the sum of down-sampling;
S53, text feature matrix e, e are extracted by all hidden state vector h using a two-way LSTMjSplicing group
At i.e. e=(h1, h2..., hN), wherein j=1,2 ..., N, N are word quantity in text;
Vision text attention c on i-th S54, calculating of sizei=viSoftmax(vi TE), pass through multiple scale vision
Text distanceCome the distance between subregion and the text t for measuring picture x, establish between picture subregion and text
Relationship:
Wherein cijFor vision text attention ciJth column, ejIt is h for the jth column of text feature matrix ej, r ()
It is the cosine similarity between two vectors;
S55, each training pair is calculatedMultiple scale vision text distance matrix Λ, I be each training batch
The sum of secondary middle training pair, xiAnd tiThe reference picture of respectively i-th trained centering and the description text of Target Photo;Λ's
I-th row jth column element beThe matched posterior probability of picture and text is P (ti|xi)=
Softmax(Λ)(i, i), the posterior probability of text and picture match is P (xi|ti)=Softmax (ΛT)(i, i);It is multi-modal similar
Property errorIt calculates are as follows:
S56, attention up-sampling operation is carried out when generating personage's picture: first calculating the word vision in i-th of size
Attention zi=eSoftmax (eTvi), the up-sampling in i-th of size isWhereinFor
Closest up-sampling operation in i-th of size, ui-1It is the up-sampling in previous size as a result, as i=1
The up-sampling operation of multiple attention is cascaded, personage's picture is generatedLearnt by fighting error;It learns
During habit, multi-modal similitude error is calculatedGenerate personage's pictureConfrontation error and Target Photo x ' with
L1 error, regard three kinds of errors as supervision message together.
Of the invention passes through text Conrad object image generation method based on generation confrontation network, compared to existing people
Object image generation method, has the advantages that
Firstly, the present invention considers the generation for describing control personage's picture by text, i.e., both controlled by text description
The attitudes vibration of personage, also has modified the clothes color attribute of personage.By seeking the control of text description, for a user
It is more friendly and conveniently.
Secondly, one can be predicted from text description the invention proposes the method for passing through text prediction personage posture
A reasonable personage's posture for meeting direction in text description, movement.
Finally, module is up-sampled the invention proposes attention, data of the module effective integration from different modalities,
Including text, posture and image.At the same time, which can retain identity of personage information in reference picture, to make to give birth to
At personage's picture it is more natural and true.
Of the invention passes through text Conrad object image generation method based on generation confrontation network, in picture generation, figure
Piece editor, pedestrian identify etc. in scenes there is good application value again.For example, being retouched in picture editor's scene according to text
One can be generated with reference picture by, which stating, communicates with personage in reference picture but posture and clothes color attribute change
Picture, obtain the picture of different postures and attribute by the keyword in modification text description, such approach carrys out user
It says more friendly and conveniently.Generating such picture has fundamental role for others work, because obtaining data set
Itself be it is expensive, in some cases even be difficult to obtain, these personage's pictures be can be generated out by this application, be conducive to
Development to other related works.
Detailed description of the invention
Fig. 1 is flow diagram of the invention;
Fig. 2 is the flow diagram in embodiment;
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right
The present invention is further elaborated.It should be appreciated that described herein, specific examples are only used to explain the present invention, not
For limiting the present invention.
On the contrary, the present invention covers any substitution done on the essence and scope of the present invention being defined by the claims, repairs
Change, equivalent method and scheme.Further, in order to make the public have a better understanding the present invention, below to of the invention thin
It is detailed to describe some specific detail sections in section description.Part without these details for a person skilled in the art
The present invention can also be understood completely in description.
With reference to Fig. 1, in the preferred embodiment, one kind passing through text Conrad object based on confrontation network is generated
Image generating method, comprising the following steps:
S1, it obtains for trained character image data set, and defines algorithm target.Its specific sub-step is as follows: personage
Image data set includes several personage's pictures, and each personage's picture is labelled with the text description for personage in the picture, fixed
The algorithm target of justice are as follows: for each of training set personage, there are reference picture x, Target Photo x ', people in Target Photo
The posture p of the object and description text t of Target Photo;Input the description text t of reference picture x and Target Photo, it is desirable that from retouching
The posture and movement for stating text t prediction target, generate and the similar picture of Target Photo x '
S2, the posture information for obtaining all images in character image data set, by clustering algorithm from all posture informations
Middle acquisition basic poses.Its specific sub-step is as follows: the posture information of all images in character image data set is obtained, by poly-
Class algorithm obtains basic poses from all posture informations, specifically includes following sub-step:
S21, personage's posture that all pictures in data set are obtained by attitude detection algorithm;
S22, personage's posture is clustered by K-means clustering algorithm, and calculates the average posture of ith clusterAnd as basic poses, K basic poses are acquired altogether
S3, the study carried out based on the posture generator for generating confrontation network from target text to prediction posture is utilized.Its
Specific sub-step is as follows:
S31, using a LSTM network, extract the feature representation vector of goal description text tBy connecting mind entirely
Through network ForiPredict the direction o of posture described by text, i.e.,Wherein o ∈ { 1 ..., K },
The consistent basic poses of direction o obtained with prediction are selected from K basic poses
S32, a generator G is used1Study is based on text informationTo adjust basic posesGenerate one in advance
Survey postureI.e.In learning process, the calculating of softmax function and true directions are utilized to direction o
Between error, calculateMean square error between posture true value p, calculates simultaneouslyConfrontation error, by three kinds of errors
It is used as supervision message together.
S4, using the acquistion of the middle school S2~S3 to posture generator predict to obtain corresponding personage's posture from text.It has
Body sub-step is as follows:
Based on the personage's posture generator established by S2~S3, the description text t of Target Photo is inputted, is predicted from text
Personage's posture direction, and basic poses are adjusted according to text, it generates the personage that one meets text description and predicts posture
S5, it is generated using the personage's picture for meet text description based on the personage's picture generator for generating confrontation network
Study, while establishing the mapping relations between picture subregion and text using multi-modal error.Its specific sub-step is as follows:
S51, feature extraction, the depth being chosen in different sizes are carried out to personage's reference picture x using convolutional neural networks
Spend feature (v1, v2..., vm), viFor the picture depth feature in i-th of size, wherein i=1,2 ..., m, m are down-sampling
Sum;
S52, posture is predicted to personage obtained in step S4 using convolutional neural networksFeature extraction is carried out, is chosen at
Depth characteristic (s in different sizes1, s2..., sm), siFor the posture depth characteristic in i-th of size, wherein i=1,
2 ..., m, m are the sum of down-sampling;
S53, text feature matrix e, e are extracted by all hidden state vector h using a two-way LSTMjSplicing group
At i.e. e=(h1, h2..., hN), wherein j=1,2 ..., N, N are word quantity in text;
Vision text attention c on i-th S54, calculating of sizei=viSoftmax(vi TE), pass through multiple scale vision text
Character-spacing fromCome the distance between subregion and the text t for measuring picture x, establish between picture subregion and text
Relationship:
Wherein cijFor vision text attention ciJth column, ejIt is h for the jth column of text feature matrix ej, r ()
It is the cosine similarity between two vectors;
S55, each training pair is calculatedMultiple scale vision text distance matrix Λ, I be each training batch
The sum of secondary middle training pair, xiAnd tiThe reference picture of respectively i-th trained centering and the description text of Target Photo;Λ's
I-th row jth column element beThe matched posterior probability of picture and text is P (ti|xi)=
Softmax(Λ)(i, i), the posterior probability of text and picture match is P (xi|ti)=Softmax (ΛT)(i, i);It is multi-modal similar
Property errorIt calculates are as follows:
S56, attention up-sampling operation is carried out when generating personage's picture: first calculating the word vision in i-th of size
Attention zi=eSoftmax (eTvi), the up-sampling in i-th of size isWhereinFor
Closest up-sampling operation in i-th of size, ui-1It is the up-sampling in previous size as a result, as i=1
The up-sampling operation of multiple attention is cascaded, personage's picture is generatedLearnt by fighting error;It learns
During habit, multi-modal similitude error is calculatedGenerate personage's pictureConfrontation error and Target Photo x ' with
L1 error, regard three kinds of errors as supervision message together.
S6, the personage's picture generator learnt using S5 input the description text of reference picture and Target Photo, i.e.,
Produce the personage's picture for meeting text description.
The above method is applied in specific embodiment below, so as to those skilled in the art can better understand that this hair
Bright effect.
Embodiment
Learn to have obtained personage's posture generator according to the step of aforementioned S1~S5 in the present embodiment and personage's picture generates
Device, the implementation method of each step show its effect only for case data below as previously mentioned, no longer elaborate specific step
Fruit.The present embodiment is implemented in the CUHK-PEDES data set with text marking, and image sources identify data in 5 pedestrians again
Collection is respectively CUHK03, Market-1501, SSM, VIPER and CUHK01, altogether includes 40206 pictures of 13003 personages.
The present embodiment is tested on CUHK-PEDES data set.
The main flow that personage's picture generates is as follows:
1) the personage's posture being consistent is predicted from description text by personage's posture generator;
2) change the keyword that color attribute is described in description text, as shown in Figure 2;
3) personage's posture of prediction, modified description text and reference picture are inputted into personage's picture generator, obtained
Personage's picture that personage's posture and attribute change;
It 4) is the validity that this method is comprehensively compared, we compare other more advanced methods and suitably modified similar
Character image generate frame to adapt to the targeted task of this method;
5) structural similarity (SSIM) of the present embodiment and Inception score (Inception score) are shown in Table 1,
Middle PT, which refers to, only changes posture, and P&AT, which refers to, not only to be changed posture but also change color attribute, is furthermore directed in the task the present embodiment and proposes
VQA perceives score (VQA perceptual score) and measures the correctness that color attribute changes.Data are shown in figure, the present invention
In structural similarity, Inception score and VQA perceive the performance in three indexs of score, with phase after other methods and modification
As compared according to the character image generation method of text description control under frame, have further promotion on the whole.Its
The calculation method of middle VQA perception score are as follows:Description is changed by program at random first
Color attribute (considering 10 kinds of colors altogether) in text generates corresponding picture, and the color attribute of change is registered as correctly answering
Case, then program inquires one, VQA model the problem of being relevant to character physical part (color of clothes or trousers), finally receives
Collect the problem of VQA model returns answer and accuracy in computations, wherein T is to return to the correct picture number of answer, and N is that picture is total
Number.
1 the present embodiment of table SSIM and IS index on CUHK-PEDES data set
Method | SSIM(PT) | IS(PT) | IS(P&AT) |
SIS[1] | 0.239±0.106 | 3.707±0.185 | 3.790±0.182 |
AttnGAN[2] | 0.298±0.126 | 3.695±0.110 | 3.726±0.123 |
PG2[3] | 0.237±0.120 | 3.473±0.009 | 3.486±0.125 |
Single AU | 0.305±0.121 | 4.015±0.009 | 4.071±0.149 |
ours | 0.364±0.123 | 4.209±0.165 | 4.218±0.195 |
2 the present embodiment of table VQA on CUHK-PEDES data set perceives Scoring Guidelines
Method | VQA perceptual score |
Real image | 0.698 |
SIS[1] | 0.275 |
AttnGAN[2] | 0.139 |
PG2[3] | 0.110 |
Single AU | 0.205 |
ours | 0.334 |
Wherein ours is the method for the present embodiment, and cascades 3 up-sampling operations in S56;Single AU refers to
In S56, the operation of 3 up-samplings is not used to be cascaded, is changed to only use an attention up-sampling operation, remaining way with
Ours is consistent;Real image refers to that data concentrate original image to pass through the result that VQA model is putd question to and answered in table 2.Remaining method pair
The bibliography answered is as follows:
[1]H.Dong,S.Yu,C.Wu,and Y.Guo.Semantic image synthesis via
adversarial learning.In ICCV,2017.
[2]T.Xu,P.Zhang,Q.Huang,H.Zhang,Z.Gan,X.Huang,and X.He.Attngan:Fine-
grained text to image generation with attentional generative adversarial
networks.In CVPR,2018.
[3]L.Ma,J.Xu,Q.Sun,B.Schiele,T.Tuytelaars,and L.Van Gool.Pose guided
person image generation.In NIPS,2017.
By above technical scheme, the present invention implements to provide based on depth learning technology a kind of based on generation confrontation network
Pass through text Conrad's object image generation method.True and animated characters' image can be generated in the present invention, by describing text
Carry out the control of the posture of personage and attribute in generation character image.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention
Made any modifications, equivalent replacements, and improvements etc., should all be included in the protection scope of the present invention within mind and principle.
Claims (6)
1. one kind passes through text Conrad object image generation method based on confrontation network is generated, which is characterized in that including following
Step:
S1, it obtains for trained character image data set, and defines algorithm target;
S2, the posture information for obtaining all images in character image data set, are obtained from all posture informations by clustering algorithm
Take basic poses;
S3, the study carried out based on the posture generator for generating confrontation network from target text to prediction posture is utilized;
S4, using the acquistion of the middle school S2~S3 to posture generator predict to obtain corresponding personage's posture from text;
S5, the personage's picture generation for carrying out meeting text description based on the personage's picture generator for generating confrontation network is utilized
It practises, while establishing the mapping relations between picture subregion and text using multi-modal error.
S6, the personage's picture generator learnt using S5 input the description text of reference picture and Target Photo, generate symbol
Close personage's picture of text description.
2. as described in claim 1 pass through text Conrad object image generation method, feature based on generation confrontation network
It is, in step S1, the character image data set includes several personage's pictures, and each personage's picture is labelled with to be directed to and be somebody's turn to do
The text description of personage, the algorithm target of definition in picture are as follows: for each of training set personage, there are reference picture x,
Target Photo x ', the posture p of the personage and description text t of Target Photo in Target Photo;Input reference picture x and target
The description text t of picture, it is desirable that from the posture and movement of description text t prediction target, generate and the similar figure of Target Photo x '
Piece
3. as claimed in claim 2 pass through text Conrad object image generation method, feature based on generation confrontation network
It is, in step S2, obtains the posture information of all images in character image data set, believed by clustering algorithm from all postures
Basic poses are obtained in breath, specifically include following sub-step:
S21, personage's posture that all pictures in data set are obtained by attitude detection algorithm;
S22, personage's posture is clustered by K-means clustering algorithm, and calculates the average posture of ith cluster
And as basic poses, K basic poses are acquired altogether
4. as claimed in claim 3 pass through text Conrad object image generation method, feature based on generation confrontation network
It is, in step S3, carries out using based on the posture generator for generating confrontation network from target text to the study for predicting posture,
Specifically include following sub-step:
S31, using a LSTM network, extract the feature representation vector of goal description text tBy connecting nerve net entirely
Network ForiPredict the direction o of posture described by text, i.e.,Wherein o ∈ { 1 ..., K }, from K
The consistent basic poses of direction o obtained with prediction are selected in a basic poses
S32, a generator G is used1Study is based on text informationTo adjust basic posesGenerate a prediction appearance
StateI.e.In learning process, direction o is calculated between true directions using softmax function
Error, calculateMean square error between posture true value p, calculates simultaneouslyConfrontation error, three kinds of errors are made together
For supervision message.
5. as claimed in claim 4 pass through text Conrad object image generation method, feature based on generation confrontation network
Be, in step S4, using the acquistion of the middle school S2~S3 to posture generator predict to obtain corresponding personage's posture from text and have
Body includes following sub-step:
Based on the personage's posture generator established by S2~S3, the description text t of Target Photo is inputted, personage is predicted from text
Posture direction, and basic poses are adjusted according to text, it generates the personage that one meets text description and predicts posture
6. as claimed in claim 5 pass through text Conrad object image generation method, feature based on generation confrontation network
It is, in step S5, utilizes the personage's picture for carrying out meeting text description based on the personage's picture generator for generating confrontation network
The study of generation, while following son is specifically included using the mapping relations that multi-modal error is established between picture subregion and text
Step:
S51, feature extraction is carried out to personage's reference picture x using convolutional neural networks, the depth being chosen in different sizes is special
Levy (v1, v2..., vm), viFor the picture depth feature in i-th of size, wherein i=1,2 ..., m, m are the total of down-sampling
Number;
S52, posture is predicted to personage obtained in step S4 using convolutional neural networksFeature extraction is carried out, difference is chosen at
Depth characteristic (s in size1, s2..., sm), siFor the posture depth characteristic in i-th of size, wherein i=1,2 ..., m,
M is the sum of down-sampling;
S53, text feature matrix e, e are extracted by all hidden state vector h using a two-way LSTMjSplicing composition, i.e. e
=(h1, h2..., hN), wherein j=1,2 ..., N, N are word quantity in text;
Vision text attention c on i-th S54, calculating of sizei=viSoftmax(vi TE), by multiple scale vision text away from
FromCome the distance between subregion and the text t for measuring picture x, the relationship between picture subregion and text is established:
Wherein cijFor vision text attention ciJth column, ejIt is h for the jth column of text feature matrix ej, r () is two
Cosine similarity between a vector;
S55, each training pair is calculatedMultiple scale vision text distance matrix Λ, I is in each trained batch
The sum of training pair, xiAnd tiThe reference picture of respectively i-th trained centering and the description text of Target Photo;The i-th row of Λ
Jth column element beThe matched posterior probability of picture and text is P (ti|xi)=Softmax
(Λ)(i, i), the posterior probability of text and picture match is P (xi|ti)=Softmax (ΛT)(i, i);Multi-modal similitude errorIt calculates are as follows:
S56, attention up-sampling operation is carried out when generating personage's picture: first calculating the word vision attention in i-th of size
zi=eSoftmax (eTvi), the up-sampling in i-th of size isWhereinFor i-th of size
On closest up-sampling operation, ui-1It is the up-sampling in previous size as a result, as i=1
The up-sampling operation of multiple attention is cascaded, personage's picture is generatedLearnt by fighting error;Learnt
Cheng Zhong calculates multi-modal similitude errorGenerate personage's pictureConfrontation error and Target Photo x ' withL1
Error regard three kinds of errors as supervision message together.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910257463.9A CN110021051B (en) | 2019-04-01 | 2019-04-01 | Human image generation method based on generation of confrontation network through text guidance |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910257463.9A CN110021051B (en) | 2019-04-01 | 2019-04-01 | Human image generation method based on generation of confrontation network through text guidance |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110021051A true CN110021051A (en) | 2019-07-16 |
CN110021051B CN110021051B (en) | 2020-12-15 |
Family
ID=67190349
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910257463.9A Active CN110021051B (en) | 2019-04-01 | 2019-04-01 | Human image generation method based on generation of confrontation network through text guidance |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110021051B (en) |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110427864A (en) * | 2019-07-29 | 2019-11-08 | 腾讯科技(深圳)有限公司 | A kind of image processing method, device and electronic equipment |
CN110555458A (en) * | 2019-07-24 | 2019-12-10 | 中北大学 | Multi-band image feature level fusion method for generating countermeasure network based on attention mechanism |
CN110705306A (en) * | 2019-08-29 | 2020-01-17 | 首都师范大学 | Evaluation method for consistency of written and written texts |
CN111046166A (en) * | 2019-12-10 | 2020-04-21 | 中山大学 | Semi-implicit multi-modal recommendation method based on similarity correction |
CN111062865A (en) * | 2020-03-18 | 2020-04-24 | 腾讯科技(深圳)有限公司 | Image processing method, image processing device, computer equipment and storage medium |
CN111091059A (en) * | 2019-11-19 | 2020-05-01 | 佛山市南海区广工大数控装备协同创新研究院 | Data equalization method in household garbage plastic bottle classification |
CN111369468A (en) * | 2020-03-09 | 2020-07-03 | 北京字节跳动网络技术有限公司 | Image processing method, image processing device, electronic equipment and computer readable medium |
CN111402365A (en) * | 2020-03-17 | 2020-07-10 | 湖南大学 | Method for generating picture from characters based on bidirectional architecture confrontation generation network |
CN111476241A (en) * | 2020-03-04 | 2020-07-31 | 上海交通大学 | Character clothing conversion method and system |
CN111583213A (en) * | 2020-04-29 | 2020-08-25 | 西安交通大学 | Image generation method based on deep learning and no-reference quality evaluation |
CN111667547A (en) * | 2020-06-09 | 2020-09-15 | 创新奇智(北京)科技有限公司 | GAN network training method, clothing picture generation method, device and electronic equipment |
CN111898456A (en) * | 2020-07-06 | 2020-11-06 | 贵州大学 | Text modification picture network model training method based on multi-level attention mechanism |
CN111950346A (en) * | 2020-06-28 | 2020-11-17 | 中国电子科技网络信息安全有限公司 | Pedestrian detection data expansion method based on generation type countermeasure network |
CN112001279A (en) * | 2020-08-12 | 2020-11-27 | 山东省人工智能研究院 | Cross-modal pedestrian re-identification method based on dual attribute information |
CN112784677A (en) * | 2020-12-04 | 2021-05-11 | 上海芯翌智能科技有限公司 | Model training method and device, storage medium and computing equipment |
CN112966760A (en) * | 2021-03-15 | 2021-06-15 | 清华大学 | Neural network fusing text and image data and design method of building structure thereof |
CN113205574A (en) * | 2021-04-30 | 2021-08-03 | 武汉大学 | Art character style migration system based on attention system |
CN113222875A (en) * | 2021-06-01 | 2021-08-06 | 浙江大学 | Image harmonious synthesis method based on color constancy |
CN113919998A (en) * | 2021-10-14 | 2022-01-11 | 天翼数字生活科技有限公司 | Image anonymization method based on semantic and attitude map guidance |
CN114119811A (en) * | 2022-01-28 | 2022-03-01 | 北京智谱华章科技有限公司 | Image generation method and device and electronic equipment |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180374249A1 (en) * | 2017-06-27 | 2018-12-27 | Mad Street Den, Inc. | Synthesizing Images of Clothing on Models |
CN109215007A (en) * | 2018-09-21 | 2019-01-15 | 维沃移动通信有限公司 | A kind of image generating method and terminal device |
CN109523616A (en) * | 2018-12-04 | 2019-03-26 | 科大讯飞股份有限公司 | A kind of FA Facial Animation generation method, device, equipment and readable storage medium storing program for executing |
-
2019
- 2019-04-01 CN CN201910257463.9A patent/CN110021051B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180374249A1 (en) * | 2017-06-27 | 2018-12-27 | Mad Street Den, Inc. | Synthesizing Images of Clothing on Models |
CN109215007A (en) * | 2018-09-21 | 2019-01-15 | 维沃移动通信有限公司 | A kind of image generating method and terminal device |
CN109523616A (en) * | 2018-12-04 | 2019-03-26 | 科大讯飞股份有限公司 | A kind of FA Facial Animation generation method, device, equipment and readable storage medium storing program for executing |
Non-Patent Citations (2)
Title |
---|
LIQIAN MA等: "Disentangled Person Image Generation", 《RESEARCHGATE》 * |
何佩林等: "基于生成对抗文本的人脸图像翻译", 《计算机技术与自动化》 * |
Cited By (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110555458A (en) * | 2019-07-24 | 2019-12-10 | 中北大学 | Multi-band image feature level fusion method for generating countermeasure network based on attention mechanism |
CN110555458B (en) * | 2019-07-24 | 2022-04-19 | 中北大学 | Multi-band image feature level fusion method for generating countermeasure network based on attention mechanism |
CN110427864A (en) * | 2019-07-29 | 2019-11-08 | 腾讯科技(深圳)有限公司 | A kind of image processing method, device and electronic equipment |
CN110705306B (en) * | 2019-08-29 | 2020-08-18 | 首都师范大学 | Evaluation method for consistency of written and written texts |
CN110705306A (en) * | 2019-08-29 | 2020-01-17 | 首都师范大学 | Evaluation method for consistency of written and written texts |
CN111091059A (en) * | 2019-11-19 | 2020-05-01 | 佛山市南海区广工大数控装备协同创新研究院 | Data equalization method in household garbage plastic bottle classification |
CN111046166A (en) * | 2019-12-10 | 2020-04-21 | 中山大学 | Semi-implicit multi-modal recommendation method based on similarity correction |
CN111476241B (en) * | 2020-03-04 | 2023-04-21 | 上海交通大学 | Character clothing conversion method and system |
CN111476241A (en) * | 2020-03-04 | 2020-07-31 | 上海交通大学 | Character clothing conversion method and system |
CN111369468A (en) * | 2020-03-09 | 2020-07-03 | 北京字节跳动网络技术有限公司 | Image processing method, image processing device, electronic equipment and computer readable medium |
CN111369468B (en) * | 2020-03-09 | 2022-02-01 | 北京字节跳动网络技术有限公司 | Image processing method, image processing device, electronic equipment and computer readable medium |
CN111402365B (en) * | 2020-03-17 | 2023-02-10 | 湖南大学 | Method for generating picture from characters based on bidirectional architecture confrontation generation network |
CN111402365A (en) * | 2020-03-17 | 2020-07-10 | 湖南大学 | Method for generating picture from characters based on bidirectional architecture confrontation generation network |
CN111062865A (en) * | 2020-03-18 | 2020-04-24 | 腾讯科技(深圳)有限公司 | Image processing method, image processing device, computer equipment and storage medium |
CN111583213A (en) * | 2020-04-29 | 2020-08-25 | 西安交通大学 | Image generation method based on deep learning and no-reference quality evaluation |
CN111583213B (en) * | 2020-04-29 | 2022-06-07 | 西安交通大学 | Image generation method based on deep learning and no-reference quality evaluation |
CN111667547B (en) * | 2020-06-09 | 2023-08-11 | 创新奇智(北京)科技有限公司 | GAN network training method, garment picture generation method and device and electronic equipment |
CN111667547A (en) * | 2020-06-09 | 2020-09-15 | 创新奇智(北京)科技有限公司 | GAN network training method, clothing picture generation method, device and electronic equipment |
CN111950346A (en) * | 2020-06-28 | 2020-11-17 | 中国电子科技网络信息安全有限公司 | Pedestrian detection data expansion method based on generation type countermeasure network |
CN111898456A (en) * | 2020-07-06 | 2020-11-06 | 贵州大学 | Text modification picture network model training method based on multi-level attention mechanism |
CN111898456B (en) * | 2020-07-06 | 2022-08-09 | 贵州大学 | Text modification picture network model training method based on multi-level attention mechanism |
CN112001279A (en) * | 2020-08-12 | 2020-11-27 | 山东省人工智能研究院 | Cross-modal pedestrian re-identification method based on dual attribute information |
CN112001279B (en) * | 2020-08-12 | 2022-02-01 | 山东省人工智能研究院 | Cross-modal pedestrian re-identification method based on dual attribute information |
CN112784677A (en) * | 2020-12-04 | 2021-05-11 | 上海芯翌智能科技有限公司 | Model training method and device, storage medium and computing equipment |
CN112966760A (en) * | 2021-03-15 | 2021-06-15 | 清华大学 | Neural network fusing text and image data and design method of building structure thereof |
CN112966760B (en) * | 2021-03-15 | 2021-11-09 | 清华大学 | Neural network fusing text and image data and design method of building structure thereof |
CN113205574A (en) * | 2021-04-30 | 2021-08-03 | 武汉大学 | Art character style migration system based on attention system |
CN113222875A (en) * | 2021-06-01 | 2021-08-06 | 浙江大学 | Image harmonious synthesis method based on color constancy |
CN113919998A (en) * | 2021-10-14 | 2022-01-11 | 天翼数字生活科技有限公司 | Image anonymization method based on semantic and attitude map guidance |
CN113919998B (en) * | 2021-10-14 | 2024-05-14 | 天翼数字生活科技有限公司 | Picture anonymizing method based on semantic and gesture graph guidance |
CN114119811B (en) * | 2022-01-28 | 2022-04-01 | 北京智谱华章科技有限公司 | Image generation method and device and electronic equipment |
CN114119811A (en) * | 2022-01-28 | 2022-03-01 | 北京智谱华章科技有限公司 | Image generation method and device and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN110021051B (en) | 2020-12-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110021051A (en) | One kind passing through text Conrad object image generation method based on confrontation network is generated | |
CN108875807B (en) | Image description method based on multiple attention and multiple scales | |
US20200250226A1 (en) | Similar face retrieval method, device and storage medium | |
CN109359559B (en) | Pedestrian re-identification method based on dynamic shielding sample | |
CN109447115A (en) | Zero sample classification method of fine granularity based on multilayer semanteme supervised attention model | |
CN108416065A (en) | Image based on level neural network-sentence description generates system and method | |
CN111709409A (en) | Face living body detection method, device, equipment and medium | |
JP2017091525A (en) | System and method for attention-based configurable convolutional neural network (abc-cnn) for visual question answering | |
CN104142995B (en) | The social event recognition methods of view-based access control model attribute | |
CN110472688A (en) | The method and device of iamge description, the training method of image description model and device | |
CN106326857A (en) | Gender identification method and gender identification device based on face image | |
US11966829B2 (en) | Convolutional artificial neural network based recognition system in which registration, search, and reproduction of image and video are divided between and performed by mobile device and server | |
CN107992890B (en) | A kind of multi-angle of view classifier and design method based on local feature | |
CN107480688A (en) | Fine granularity image-recognizing method based on zero sample learning | |
CN112949622A (en) | Bimodal character classification method and device fusing text and image | |
CN113761153A (en) | Question and answer processing method and device based on picture, readable medium and electronic equipment | |
CN106897671A (en) | A kind of micro- expression recognition method encoded based on light stream and FisherVector | |
CN111881716A (en) | Pedestrian re-identification method based on multi-view-angle generation countermeasure network | |
CN111582342A (en) | Image identification method, device, equipment and readable storage medium | |
CN108154156A (en) | Image Ensemble classifier method and device based on neural topic model | |
CN111507184B (en) | Human body posture detection method based on parallel cavity convolution and body structure constraint | |
CN105718898A (en) | Face age estimation method and system based on sparse undirected probabilistic graphical model | |
CN117033609B (en) | Text visual question-answering method, device, computer equipment and storage medium | |
Zhang | Innovation of English teaching model based on machine learning neural network and image super resolution | |
Feng | Mask RCNN-based single shot multibox detector for gesture recognition in physical education |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |