CN116385597B - Text mapping method and device - Google Patents

Text mapping method and device Download PDF

Info

Publication number
CN116385597B
CN116385597B CN202310231486.9A CN202310231486A CN116385597B CN 116385597 B CN116385597 B CN 116385597B CN 202310231486 A CN202310231486 A CN 202310231486A CN 116385597 B CN116385597 B CN 116385597B
Authority
CN
China
Prior art keywords
text
image
context
content
characteristic data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310231486.9A
Other languages
Chinese (zh)
Other versions
CN116385597A (en
Inventor
秦鹏达
潘禧辰
李裕宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Alibaba China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba China Co Ltd filed Critical Alibaba China Co Ltd
Priority to CN202310231486.9A priority Critical patent/CN116385597B/en
Publication of CN116385597A publication Critical patent/CN116385597A/en
Application granted granted Critical
Publication of CN116385597B publication Critical patent/CN116385597B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/60Editing figures and text; Combining figures or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • G06F18/256Fusion techniques of classification results, e.g. of results related to same input data of results relating to different input data, e.g. multimodal recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0641Shopping interfaces
    • G06Q30/0643Graphical representation of items or shoppers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/478Supplemental services, e.g. displaying phone caller identification, shopping application
    • H04N21/47815Electronic shopping

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Business, Economics & Management (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Accounting & Taxation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Finance (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Processing Or Creating Images (AREA)
  • Character Input (AREA)

Abstract

The application discloses a text mapping method and device, a text generation image model processing method and device and electronic equipment. The text mapping method generates a target image corresponding to any content text in a content text sequence forming a story based on the target content text and a context text thereof and a context image generated based on the context text. Because the correlation degree between the target image and the target content text is considered, the correlation degree between the target image and the context content text is also considered, and the semantic consistency between the target image and the context image corresponding to the context content text is considered, the content continuity and plot consistency of the story-matching diagram can be effectively improved.

Description

Text mapping method and device
Technical Field
The application relates to the technical field of image processing, in particular to a text mapping method and device, a text generation image model processing method and device and electronic equipment.
Background
The content innovation speed is an important influencing factor of the traffic scale. In various multimedia information media, image information is more visual and has visual impact force relative to text information, and propagation effect is easier to bring, so that artificial intelligence technology for generating images based on text becomes a research hot spot.
Currently, a typical machine learning model that generates images from textual descriptions is an image generation model based on a diffusion model. However, in the process of implementing the present invention, the inventors found that the above technical solution has at least the following problems: the diffusion model-based image generation model is only focused on the generation of a single picture, and does not have the capability of generating a series of images with content continuity and plot continuity, for example, in scenes such as cartoon making, children story book text matching drawing making, novel inserting drawing making, self-media article text matching drawing making and the like, the continuity and continuity of the content plot cannot be met by people, scenes and styles in the generated pictures.
Disclosure of Invention
The application provides a text graph matching method to solve the problem of poor content consistency between graph matching sequences in the prior art. The application additionally provides a text mapping device, a text generation image model processing method and device and electronic equipment.
The application provides a text mapping method, which comprises the following steps:
acquiring a content text sequence of a target object;
acquiring first characteristic data of a target content text in the text sequence;
acquiring at least one context text of the target content text and a context image corresponding to the context text;
Acquiring second characteristic data corresponding to the context text according to the context text and the corresponding context image;
and generating a target image corresponding to the target content text according to the first characteristic data and at least one second characteristic data corresponding to the at least one context content text.
Optionally, the generating the target image corresponding to the target content text according to the first feature data and the at least one second feature data corresponding to the at least one context content text includes:
acquiring a noise image;
and removing noise from the noise image according to the first characteristic data and the at least one second characteristic data through a diffusion model, and taking the image after noise removal as the target image.
Optionally, the method further comprises:
extracting a first feature map of the noise image;
the removing noise from the noise image according to the first feature data and the at least one second feature data through a diffusion model, and taking the image after removing noise as the target image, includes:
removing noise from the first feature image according to the first feature data and the at least one second feature data through a diffusion model, and taking the image after noise removal as a second feature image;
And up-sampling the second characteristic diagram, and taking the up-sampled image as the target image.
Optionally, the acquiring the first feature data of the target content text in the text sequence includes:
performing word vector and word position embedding processing on the target content text to form third characteristic data of the target content text;
extracting fourth feature data of the target content text from the third feature data;
and acquiring the first characteristic data according to the fourth characteristic data and the target text type information.
Optionally, the obtaining, according to the context text and the corresponding context image, second feature data corresponding to the context text includes:
and carrying out multi-mode joint coding on the context text and the corresponding context image to form second characteristic data of graphic fusion.
Optionally, the performing multi-mode joint encoding on the context text and the corresponding context image to form second feature data of image-text fusion includes:
performing word vector and word position embedding processing on the context text to form fifth characteristic data of the context text;
Dividing the context image into a plurality of sub-images;
acquiring sixth characteristic data of the context image according to the characteristic data of the plurality of subgraphs and subgraph position information;
acquiring seventh feature data of the context content text according to the fifth feature data and the sixth feature data;
and acquiring the second characteristic data according to the seventh characteristic data, the context text type information and the context text serial number information.
Optionally, constructing the text generates an image model, the model comprising: a condition information encoding network and an image generating network;
the conditional information encoding network includes: a first feature data acquisition network and at least one second feature data acquisition network;
the first characteristic data acquisition network is used for acquiring the first characteristic data according to the target content text;
the second characteristic data acquisition network is used for carrying out multi-mode joint coding on the context content text and the context image to form second characteristic data;
the image generation network is used for generating the target image according to the first characteristic data and the at least one second characteristic data.
Optionally, the constructing text to generate the image model includes:
acquiring a text and a corresponding image which are irrelevant to the target object, and forming a first training sample set;
learning from the first training sample set to obtain a text to generate an image model;
acquiring a text and a corresponding image related to the target object to form a second training sample set;
and adjusting parameters of the text generated image model according to the second training sample set.
Optionally, the acquiring text and corresponding image related to the target object includes:
acquiring at least one role description information, at least one scene description information and/or at least one picture style description information;
generating an image model through a text obtained by learning in the first training sample set, and generating at least one character image design picture according to at least one character description information; generating at least one scene design picture according to the at least one scene description information; and/or generating at least one picture style design picture according to the at least one picture style description information;
taking the character description information and the character image design picture as a second training sample; taking the scene description information and the scene design picture as a second training sample; and/or taking the picture style description information and the picture style design picture as a second training sample.
Optionally, the method further comprises:
acquiring a new text and a corresponding image related to the target object to form a third training sample set;
and adjusting parameters of the text generation image model according to the third training sample set, wherein the model is used for generating a text map for the text to be processed of the target object.
Optionally, the obtaining the added text and the corresponding image related to the target object includes:
acquiring at least one newly added character description information, at least one newly added scene description information and/or at least one newly added picture style description information;
generating an image model through the text learned from the second training sample set, and generating at least one newly-added character image design picture according to at least one newly-added character description information; generating at least one newly added scene design picture according to the at least one newly added scene description information; and/or generating at least one newly added picture style design picture according to the at least one newly added picture style description information;
taking the newly added character description information and the newly added character image design picture as a third training sample; taking the newly added scene description information and the newly added scene design picture as a third training sample; and/or taking the newly added picture style description information and the newly added picture style design picture as a third training sample.
The application also provides a text graphic device, comprising:
a text sequence obtaining unit for obtaining a content text sequence of the target object;
a first feature data acquisition unit, configured to acquire first feature data of a target content text in the text sequence;
a context data obtaining unit, configured to obtain at least one context text of the target content text and a context image corresponding to the context text;
a second feature data obtaining unit, configured to obtain second feature data corresponding to the context text according to the context text and the corresponding context image;
and the image generation unit is used for generating a target image corresponding to the target content text according to the first characteristic data and at least one second characteristic data corresponding to the at least one context content text.
The application also provides a text generation image model processing method, which comprises the following steps:
acquiring a text and a corresponding image irrelevant to a target object to form a first training sample set;
learning from the first training sample set to obtain a text generation image model, wherein the model comprises a condition information coding network and an image generation network;
Acquiring a text and a corresponding image related to the target object to form a second training sample set;
and adjusting parameters of the model according to the second training sample set.
The application also provides a text generation image model processing device, which comprises:
the first training sample acquisition unit is used for acquiring texts and corresponding images irrelevant to the target object to form a first training sample set;
the first training unit is used for learning from the first training sample set to obtain a text generation image model, and the model comprises a condition information coding network and an image generation network;
the second training sample acquisition unit is used for acquiring texts and corresponding images related to the target object to form a second training sample set;
and the second training unit is used for adjusting parameters of the model according to the second training sample set.
The application also provides a story mapping method, which comprises the following steps:
receiving a content text sequence of a target story submitted by a client;
acquiring first characteristic data of the content text;
acquiring at least one context content text of the content text and a context image corresponding to the context content text;
Acquiring second characteristic data corresponding to the context text according to the context text and the corresponding context image;
generating an image corresponding to the content text according to the first characteristic data and at least one second characteristic data corresponding to the at least one contextual content text;
and sending an image sequence corresponding to the content text sequence to the client.
Optionally, the method further comprises:
receiving text and corresponding images related to the target story submitted by the client;
and learning to obtain a text generation image model according to the text and the corresponding image related to the target story, and generating the target image.
Optionally, the method further comprises:
receiving a new text and a corresponding image related to the target story submitted by the client;
and adjusting the text to generate an image model according to the newly added text and the corresponding image.
The application also provides a story mapping method, which comprises the following steps:
acquiring a content text sequence of a target story;
the content text sequence is sent to a server side, so that the server side obtains first characteristic data of the content text; acquiring at least one context content text of the content text and a context image corresponding to the context content text; acquiring second characteristic data corresponding to the context text according to the context text and the corresponding context image; generating an image corresponding to the content text according to the first characteristic data and at least one second characteristic data corresponding to the at least one contextual content text;
And displaying the image sequence which is returned by the server and corresponds to the content text sequence.
The application also provides a commodity live broadcast method, which comprises the following steps:
receiving a description content sequence of a target commodity submitted by a client;
acquiring first characteristic data of the descriptive content;
acquiring at least one context descriptive content of the descriptive content and a context image corresponding to the context descriptive content;
acquiring second characteristic data corresponding to the context description content according to the context description content and the corresponding context image;
generating an image corresponding to the description content according to the first characteristic data and at least one second characteristic data corresponding to the at least one contextual description content;
and publishing the image sequence corresponding to the description content sequence to a live broadcast platform.
The application also provides a commodity live broadcast method, which comprises the following steps:
acquiring a description content sequence of a target commodity;
the description content sequence is sent to a server side, so that the server side obtains first characteristic data of the description content; acquiring at least one context descriptive content of the descriptive content and a context image corresponding to the context descriptive content; acquiring second characteristic data corresponding to the context description content according to the context description content and the corresponding context image; generating an image corresponding to the description content according to the first characteristic data and at least one second characteristic data corresponding to the at least one contextual description content; and publishing the image sequence corresponding to the description content sequence to a live broadcast platform.
The application also provides a commodity release method, which comprises the following steps:
receiving a description content sequence of a target commodity submitted by a client;
acquiring first characteristic data of the descriptive content;
acquiring at least one context descriptive content of the descriptive content and a context image corresponding to the context descriptive content;
acquiring second characteristic data corresponding to the context description content according to the context description content and the corresponding context image;
generating an image corresponding to the description content according to the first characteristic data and at least one second characteristic data corresponding to the at least one contextual description content;
and publishing an image sequence corresponding to the descriptive content sequence to an item detail page of the target item.
The application also provides a commodity release method, which comprises the following steps:
acquiring a description content sequence of a target commodity;
the description content sequence is sent to a server side, so that the server side obtains first characteristic data of the description content; acquiring at least one context descriptive content of the descriptive content and a context image corresponding to the context descriptive content; acquiring second characteristic data corresponding to the context description content according to the context description content and the corresponding context image; generating an image corresponding to the description content according to the first characteristic data and at least one second characteristic data corresponding to the at least one contextual description content; and publishing an image sequence corresponding to the descriptive content sequence to an item detail page of the target item.
The present application also provides a computer-readable storage medium having instructions stored therein that, when executed on a computer, cause the computer to perform the various methods described above.
The present application also provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the various methods described above.
Compared with the prior art, the application has the following advantages:
according to the text mapping method provided by the embodiment of the application, for any content text in a content text sequence forming a story, a target image corresponding to the target content text is generated based on the target content text and the context content text thereof and a context image generated based on the context content text. Because the correlation degree between the target image and the target content text is considered, the correlation degree between the target image and the context content text is also considered, and the semantic consistency between the target image and the context image corresponding to the context content text is considered, the content continuity and plot consistency of the story-matching diagram can be effectively improved.
Drawings
FIG. 1 is a flow diagram of an embodiment of a text-to-graph method provided herein;
FIG. 2 is a schematic diagram of an embodiment of a text-to-graphic method provided herein;
FIG. 3 is a schematic view of an embodiment of a text-to-graphic method provided herein;
FIG. 4 is a further schematic view of an embodiment of the text-to-graphic method provided herein;
FIG. 5 is a flowchart of adding training data before starting the text matching in the embodiment of the text matching method provided in the present application;
fig. 6 is a schematic flow chart of adding training data in the mapping process of the embodiment of the text mapping method provided in the present application.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is, however, susceptible of embodiment in many other ways than those herein described and similar generalizations can be made by those skilled in the art without departing from the spirit of the application and the application is therefore not limited to the specific embodiments disclosed below.
In the application, a text mapping method and device, a text generation image model processing method and device and an electronic device are provided. The various schemes are described in detail below in the examples.
First embodiment
Please refer to fig. 1, which is a flowchart of a text mapping method of the present application. In this embodiment, the method may include the steps of:
Step S101: and acquiring a content text sequence of the target object.
The target object includes a content text sequence including a plurality of context-dependent text. For example, the target object is a children's story, the story includes a plurality of text segments, and each text segment is mapped to obtain a comic form of the story.
As shown in fig. 2, the method provided in the embodiment of the present application may be applied to a scenario for mapping a story, where the scenario includes a first user client and a first service end, the first user client is a story viewer, the first user client may be a terminal device such as a personal computer, a smart phone, a tablet computer, etc., the first service end is deployed with a text mapping system (also referred to as a text mapping system), input data of the text mapping system is a content text sequence of the story, and output data is a mapping sequence of the story. In specific implementation, the text mapping system can generate an image model by using texts through a story mapping module, and acquire a mapping sequence of a story according to a content text sequence of the story.
Step S103: and acquiring first characteristic data of target content text in the text sequence.
The first feature data refers to text feature data of the target content text. The method provided by the embodiment of the application can execute the processing of step S103 on each text segment in the text sequence, and obtain the first feature data of each text segment.
Step S105: and acquiring at least one context content text of the target content text and a context image corresponding to the context content text.
For any target content text in the text sequence, one or more context content texts and a context image corresponding to the context content texts can be obtained. Contextual content text includes, but is not limited to, content text in the context of the target content text, and may also include content text in the context. The context text may be not only a context text adjacent to the target content text but also a context text separated from the target content text by other content text.
The context image corresponding to the context text may be a corresponding image generated for the context text by using the method provided in the embodiment of the present application, or may be a corresponding image generated or specified by other means.
Step S107: and acquiring second characteristic data corresponding to the context text according to the context text and the corresponding context image.
The second feature data refers to text of the context and the image-text fusion feature data of the corresponding image, such as the text of the last moment of the target content text and the image-text fusion feature data of the corresponding image. The image-text fusion characteristic data can be simple superposition of image characteristic data and text characteristic data, or can be characteristic data obtained by jointly encoding the image and the text.
Step S109: and generating a target image corresponding to the target content text according to the first characteristic data and at least one second characteristic data corresponding to the at least one context content text.
In one example, first feature data of a target content text in the text sequence is obtained through a conditional information encoding network (also referred to as an autoregressive feature extraction network) of a text-generating image model; acquiring second characteristic data corresponding to the context text according to the context text and the corresponding context image; and generating a target image corresponding to the target content text according to the first characteristic data and at least one second characteristic data corresponding to the at least one context content text through an image generation network of a text generation image model.
In a specific implementation, the condition information encoding network may include a first feature acquiring network and at least one second feature acquiring network, where the first feature acquiring network is configured to acquire first feature data of a target content text in the text sequence, and the second feature acquiring network is configured to acquire second feature data corresponding to a context content text according to the context content text and a corresponding context image at a certain moment.
The plurality of second feature acquisition networks respectively correspond to the context content text at different moments. As shown in fig. 2, a corresponding image of the target content text may be generated according to the target content text and m context content texts preceding the target content text, and the images respectively corresponding to the context content texts. In this case, the autoregressive feature extraction network may include m second feature extraction networks, and the target content text is regarded as an nth frame text, and then the 1 st second feature extraction network corresponds to an n-1 st frame text, the 2 nd second feature extraction network corresponds to an n-2 nd frame text, and …, and the m second feature extraction network corresponds to an n-m th frame text. In the process of generating the story line, if the target content text is a 2 nd frame text, the 1 st second feature acquisition network corresponds to the 1 st frame text, and the image generation network outputs a 2 nd frame image; if the target content text is the 3 rd frame text, the 1 st second feature acquisition network may correspond to the 2 nd frame text, the 2 nd second feature acquisition network may correspond to the 1 st frame text, and the image generation network outputs the 3 rd frame image.
In one example, the first feature data acquisition network may acquire the first feature data of the target content text in the following manner: 1) Performing word vector and word position embedding processing on the target content text to form third characteristic data of the target content text; 2) Extracting fourth feature data of the target content text from the third feature data; 3) And acquiring the first characteristic data according to the fourth characteristic data and the target text type information.
A piece of text may include a plurality of words, word position referring to the sequence position data of the words in the text, such as the 1 st word, the 2 nd word, etc. By carrying out word position embedding processing on the target content text, input sequence information is injected into the model, and feature extraction accuracy can be effectively improved.
The feature type corresponding to the first feature data acquisition network is target text type information, and the target text type information represents the first feature data of which the feature data output by the corresponding network is target content text (description of a picture to be generated). Correspondingly, the feature type corresponding to the second feature data acquisition network is context text type information, and the context type information indicates that the feature data output by the corresponding network is fusion feature data of the text of the context and the corresponding image, namely the second feature data. By adopting the processing mode, the image generation network can determine the types of the characteristic data, so that the accuracy of the characteristic data is improved.
As shown in fig. 3, in the implementation, the first feature data obtaining network may perform word vector embedding processing on the target content Text through a Text embedding layer (Text embedded i ng), and extract fourth feature data of the target content Text from the third feature data through a Text modeling layer (Text Mode l). Fig. 3 also shows embedded data of feature type and word position, feature type "0" represents a target text type, and feature type "1" represents a context text type.
In one example, the second feature data acquisition network may acquire the second feature data in the following manner: and carrying out multi-mode joint coding on the context text and the corresponding context image to form second characteristic data of graphic fusion. By jointly encoding the context and the image, the accuracy of the second feature data can be effectively improved.
In specific implementation, the second feature data acquiring network may acquire the second feature data in the following manner: 1) Performing word vector and word position embedding processing on the context text to form fifth characteristic data of the context text; 2) Dividing the context image into a plurality of sub-images; 3) Acquiring sixth characteristic data of the context image according to the characteristic data of the plurality of subgraphs and subgraph position information; 4) Acquiring seventh feature data of the context content text according to the fifth feature data and the sixth feature data; 5) And acquiring the second characteristic data according to the seventh characteristic data, the context text type information and the context text serial number information. The feature type of the second feature data is a context text type.
The word vector and word position embedding process for the context text is similar to the word vector and word position embedding process for the target text, and will not be repeated here.
In the implementation, the corresponding context image can be divided into a plurality of sub-images corresponding to the words according to the word number of the context text, the characteristic data of each sub-image is extracted, and the sub-image position information is embedded, so that the context text and the corresponding image can be better subjected to joint characteristic coding, and the accuracy of the second characteristic data can be effectively improved.
In particular, corresponding time information (context text sequence number information) may be embedded in the plurality of second feature data, for example, the sequence number "n-1" is embedded in the second feature data corresponding to the context text of the n-1 th frame, and the sequence number "n-m" is embedded in the second feature data corresponding to the context text of the n-m-th frame. By adopting the processing mode, the image generation network can determine the time information of the context content text corresponding to each second characteristic data, and different emphasis consideration can be carried out on a plurality of second characteristic data according to the time information of the second characteristic data, so that the content consistency of the target image and the context image is improved.
In one example, step S109 may be implemented as follows: acquiring a noise image; and removing noise from the noise image according to the first characteristic data and the at least one second characteristic data through a diffusion model, and taking the image after noise removal as the target image. In the implementation mode, the image generation network adopts the diffusion model, and the diffusion model has the characteristic of higher image generation quality, so that the image quality of the story-matching picture can be effectively improved.
In particular, step S109 may employ a diffusion model or other more sophisticated image generation network. In this embodiment, the method may further include the steps of: extracting a first feature map of the noise image; accordingly, step S109 may be implemented as follows: removing noise from the first feature image according to the first feature data and the at least one second feature data through a diffusion model, and taking the image after noise removal as a second feature image; and up-sampling the second characteristic diagram, and taking the up-sampled image as the target image. The first feature map is smaller than the noise image in size, and the diffusion model is used for processing the noise image with the smaller size, so that the image generation quality efficiency can be effectively improved.
Thus, the text mapping process is described, and the processing manner of generating the image model by the text applied in the text mapping process is described below.
In one example, the method may further comprise the steps of: building a text-generated image model, the model comprising: a condition information encoding network and an image generating network; the conditional information encoding network includes: a first feature data acquisition network and at least one second feature data acquisition network; the first characteristic data acquisition network is used for acquiring the first characteristic data according to the target content text; the second characteristic data acquisition network is used for carrying out multi-mode joint coding on the context content text and the context image to form second characteristic data; the image generation network is used for generating the target image according to the first characteristic data and the at least one second characteristic data.
In specific implementation, the text generation image model may include the following steps: 1) Acquiring a text and a corresponding image which are irrelevant to the target object, and forming a first training sample set; 2) Learning from the first training sample set to obtain a text to generate an image model; 3) Acquiring a text and a corresponding image related to the target object to form a second training sample set; 4) And adjusting parameters of the text generated image model according to the second training sample set.
The text generation image model obtained by learning in the first training sample set irrelevant to the target object is a model which is universal to various objects, and the model which is adjusted according to the text generation image model in the second training sample set relevant to the target object is a special model which is adaptive to the target object. By adopting the processing mode, the processing of the custom role, the custom scene and the custom style on the target object can be supported, so that the accuracy of text mapping can be effectively improved, and the use experience and the use flexibility can be improved. The following table shows the data for the first training sample set and the second training sample set.
In this table, the relevant training sample for story a is the second training sample and the other training samples are the first training samples.
As shown in fig. 4 and 5, in the implementation, to obtain the text and the corresponding image related to the target object, the following sub-steps may be included: 1) Acquiring at least one role description information, at least one scene description information and/or at least one picture style description information; 2) Generating an image model through a text obtained by learning in the first training sample set, and generating at least one character image design picture according to at least one character description information; generating at least one scene design picture according to the at least one scene description information; and/or generating at least one picture style design picture according to the at least one picture style description information; 3) Taking the character description information and the character image design picture as a second training sample; taking the scene description information and the scene design picture as a second training sample; and/or taking the picture style description information and the picture style design picture as a second training sample. By adopting the processing mode, a manager of the story line can orderly input one or more of character description information, scene description information and picture style description information, and the system can automatically generate images corresponding to the description information, so that texts and corresponding images related to target objects can be automatically acquired, and therefore, the generation efficiency of story materials can be effectively improved, and the story line efficiency is improved.
As shown in fig. 6, in implementation, the method may further include the following steps: acquiring a new text and a corresponding image related to the target object to form a third training sample set; and adjusting parameters of the text generation image model according to the third training sample set, wherein the model is used for generating a text map for the text to be processed of the target object. By adopting the processing mode, the story materials, such as character design pictures of newly added characters, can be newly added in the story mapping process, so that the flexibility and convenience are further improved, and the actual use habit of a user is met.
In the implementation, to obtain the new text and the corresponding image related to the target object, the method may include the following sub-steps: 1) Acquiring at least one newly added character description information, at least one newly added scene description information and/or at least one newly added picture style description information; 2) Generating an image model through the text learned from the second training sample set, and generating at least one newly-added character image design picture according to at least one newly-added character description information; generating at least one newly added scene design picture according to the at least one newly added scene description information; and/or generating at least one newly added picture style design picture according to the at least one newly added picture style description information; 3) Taking the newly added character description information and the newly added character image design picture as a third training sample; taking the newly added scene description information and the newly added scene design picture as a third training sample; and/or taking the newly added picture style description information and the newly added picture style design picture as a third training sample.
As can be seen from the foregoing embodiments, in the text mapping method provided in the embodiments of the present application, for any content text in a sequence of content texts that constitute a story, a target image corresponding to the target content text is generated based on the target content text and its context content text, and a context image generated based on the context content text. Because the correlation degree between the target image and the target content text is considered, the correlation degree between the target image and the context content text is also considered, and the semantic consistency between the target image and the context image corresponding to the context content text is considered, the content continuity and plot consistency of the story-matching diagram can be effectively improved.
Second embodiment
In the above embodiment, a text mapping method is provided, and correspondingly, the application also provides a text mapping device. The device corresponds to the embodiment of the method described above. Since the apparatus embodiments are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points. The device embodiments described below are merely illustrative.
The application additionally provides a text-to-graphics device comprising:
the present application additionally provides a text-generated image model processing apparatus including: the device comprises a text sequence acquisition unit, a first characteristic data acquisition unit, a context data acquisition unit, a second characteristic data acquisition unit and an image generation unit.
A text sequence obtaining unit for obtaining a content text sequence of the target object; a first feature data acquisition unit, configured to acquire first feature data of a target content text in the text sequence; a context data obtaining unit, configured to obtain at least one context text of the target content text and a context image corresponding to the context text; a second feature data obtaining unit, configured to obtain second feature data corresponding to the context text according to the context text and the corresponding context image; and the image generation unit is used for generating a target image corresponding to the target content text according to the first characteristic data and at least one second characteristic data corresponding to the at least one context content text.
In one example, the image generation unit is specifically configured to acquire a noise image; and removing noise from the noise image according to the first characteristic data and the at least one second characteristic data through a diffusion model, and taking the image after noise removal as the target image.
In one example, the apparatus further comprises: a feature map extracting unit configured to extract a first feature map of the noise image; the image generating unit is specifically configured to remove noise from the first feature map according to the first feature data and the at least one second feature data through a diffusion model, and use the image after noise removal as a second feature map; and up-sampling the second characteristic diagram, and taking the up-sampled image as the target image.
In one example, the first feature data obtaining unit is specifically configured to perform word vector and word position embedding processing on the target content text to form third feature data of the target content text; extracting fourth feature data of the target content text from the third feature data; and acquiring the first characteristic data according to the fourth characteristic data and the target text type information.
In one example, the second feature data obtaining unit is specifically configured to perform multi-mode joint encoding on the context text and the corresponding context image to form second feature data of the graphic fusion.
In one example, the second feature data obtaining unit is specifically configured to perform word vector and word position embedding processing on the context content text to form fifth feature data of the context content text; dividing the context image into a plurality of sub-images; acquiring sixth characteristic data of the context image according to the characteristic data of the plurality of subgraphs and subgraph position information; acquiring seventh feature data of the context content text according to the fifth feature data and the sixth feature data; and acquiring the second characteristic data according to the seventh characteristic data, the context text type information and the context text serial number information.
In one example, the apparatus further comprises: a model construction unit for constructing a text-generated image model, the model comprising: a condition information encoding network and an image generating network; the conditional information encoding network includes: a first feature data acquisition network and at least one second feature data acquisition network; the first characteristic data acquisition network is used for acquiring the first characteristic data according to the target content text; the second characteristic data acquisition network is used for carrying out multi-mode joint coding on the context content text and the context image to form second characteristic data; the image generation network is used for generating the target image according to the first characteristic data and the at least one second characteristic data.
In one example, the model building unit is specifically configured to obtain a text and a corresponding image that are unrelated to the target object, and form a first training sample set; learning from the first training sample set to obtain a text to generate an image model; acquiring a text and a corresponding image related to the target object to form a second training sample set; and adjusting parameters of the text generated image model according to the second training sample set.
In one example, the model building unit is specifically configured to obtain at least one role description information, at least one scene description information, and/or at least one picture style description information; generating an image model through a text obtained by learning in the first training sample set, and generating at least one character image design picture according to at least one character description information; generating at least one scene design picture according to the at least one scene description information; and/or generating at least one picture style design picture according to the at least one picture style description information; taking the character description information and the character image design picture as a second training sample; taking the scene description information and the scene design picture as a second training sample; and/or taking the picture style description information and the picture style design picture as a second training sample.
In one example, the apparatus further comprises: the model adjusting unit is used for acquiring a new text and a corresponding image related to the target object to form a third training sample set; and adjusting parameters of the text generation image model according to the third training sample set, wherein the model is used for generating a text map for the text to be processed of the target object.
In one example, the apparatus further comprises: the model adjusting unit is specifically used for acquiring at least one newly-added character description information, at least one newly-added scene description information and/or at least one newly-added picture style description information; generating an image model through the text learned from the second training sample set, and generating at least one newly-added character image design picture according to at least one newly-added character description information; generating at least one newly added scene design picture according to the at least one newly added scene description information; and/or generating at least one newly added picture style design picture according to the at least one newly added picture style description information; taking the newly added character description information and the newly added character image design picture as a third training sample; taking the newly added scene description information and the newly added scene design picture as a third training sample; and/or taking the newly added picture style description information and the newly added picture style design picture as a third training sample.
Third embodiment
In the above embodiment, a text mapping method is provided, and corresponding electronic equipment is also provided. The device corresponds to an embodiment of the method described above. Since the apparatus embodiments are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points. The device embodiments described below are merely illustrative.
The electronic device of the present embodiment includes: a processor and a memory; a memory for storing a program for implementing any one of the text mapping methods described above, the apparatus being powered on and running the program of the method by the processor.
The memory may be implemented by any type of volatile or nonvolatile memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
In particular implementations, the electronic device may further include one or more of the following: a power component, an input/output (I/O) interface, and a communication component. The power supply assembly provides power to the various components of the electronic device. Power components may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for electronic devices. The I/O interface provides an interface between the processor 503 and peripheral interface modules, which may be a keyboard, click wheel, buttons, etc. The communication component is configured to facilitate wired or wireless communication between the electronic device and a user device (e.g., a smart phone, tablet, etc.).
Fourth embodiment
In the above embodiment, a text mapping method is provided, and correspondingly, the application also provides a text generated image model processing method. The method corresponds to the embodiment of the method described above. Since this method embodiment is substantially similar to method embodiment one, the description is relatively simple, and reference is made to the description of method embodiments in part. The method embodiments described below are merely illustrative.
The application additionally provides a text generation image model processing method, which comprises the following steps:
step 1: and acquiring a text and a corresponding image which are irrelevant to the target object, and forming a first training sample set.
Step 2: and learning from the first training sample set to obtain a text generated image model, wherein the model comprises a condition information coding network and an image generating network.
Step 3: and acquiring a text and a corresponding image related to the target object to form a second training sample set.
Step 4: and adjusting parameters of the model according to the second training sample set.
The condition information encoding network includes a first feature data acquisition network and at least one second feature data acquisition network. The model is used for acquiring first characteristic data of a target content text in a content text sequence of the target object through the first characteristic data acquisition network; acquiring at least one context text of the target content text and a context image corresponding to the context text; through the second characteristic data acquisition network, carrying out multi-mode joint coding on the context text of the target content text and the context image corresponding to the context text to form second characteristic data; and generating a target image corresponding to the target content text according to the first characteristic data and at least one second characteristic data corresponding to at least one context content text of the target content text through the image generation network.
The method provided in this embodiment corresponds to the text-to-image model portion in the first embodiment, and will not be described here again. The method provided by the embodiment of the application can support the processing of the custom role, the custom scene and the custom style on the target object, so that the accuracy of text mapping can be effectively improved, and the use experience and the use flexibility can be improved.
Fifth embodiment
In the above embodiment, a text-generated image model processing method is provided, and correspondingly, the application also provides a text-generated image model processing device. The device corresponds to the embodiment of the method described above. Since the apparatus embodiments are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points. The device embodiments described below are merely illustrative.
The present application additionally provides a text-generated image model processing apparatus including: the first training sample acquisition unit is used for acquiring texts and corresponding images irrelevant to the target object to form a first training sample set; the first training unit is used for learning from the first training sample set to obtain a text generation image model, and the model comprises a condition information coding network and an image generation network; the second training sample acquisition unit is used for acquiring texts and corresponding images related to the target object to form a second training sample set; and the second training unit is used for adjusting parameters of the model according to the second training sample set.
Sixth embodiment
In the above embodiment, a text-generated image model processing method is provided, and corresponding to the text-generated image model processing method, the application also provides an electronic device. The device corresponds to an embodiment of the method described above. Since the apparatus embodiments are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points. The device embodiments described below are merely illustrative.
The electronic device of the present embodiment includes:
the electronic device of the present embodiment includes: a processor and a memory; a memory for storing a program for implementing the text-to-image model processing method of any one of the above, the apparatus being powered on and running the program of the method by the processor.
The memory may be implemented by any type of volatile or nonvolatile memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
In particular implementations, the electronic device may further include one or more of the following: a power component, an input/output (I/O) interface, and a communication component. The power supply assembly provides power to the various components of the electronic device. Power components may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for electronic devices. The I/O interface provides an interface between the processor 503 and peripheral interface modules, which may be a keyboard, click wheel, buttons, etc. The communication component is configured to facilitate wired or wireless communication between the electronic device and a user device (e.g., a smart phone, tablet, etc.).
Seventh embodiment
In the above embodiment, a text mapping method is provided, and correspondingly, the present application further provides a story mapping method for a server. The method corresponds to the embodiment of the method described above. Since this method embodiment is substantially similar to method embodiment one, the description is relatively simple, and reference is made to the description of method embodiments in part. The method embodiments described below are merely illustrative.
The application additionally provides a story mapping method comprising:
step 1: a sequence of content text of a target story submitted by a client is received.
Clients include, but are not limited to: and terminal equipment such as personal computers, smart phones, tablet computers and the like. The client acquires a content text sequence of the target story; and sending the content text sequence to a server. The server performs the following steps 2 to 5 for each content file in the content text sequence.
Step 2: and acquiring first characteristic data of the content text.
Step 3: and acquiring at least one context content text of the content text and a context image corresponding to the context content text.
Step 4: and acquiring second characteristic data corresponding to the context text according to the context text and the corresponding context image.
Step 5: and generating an image corresponding to the content text according to the first characteristic data and at least one second characteristic data corresponding to the at least one contextual content text.
Step 6: and sending an image sequence corresponding to the content text sequence to the client.
And the server side sends the image sequence corresponding to the content text sequence to the client side, and the client side displays the image sequence.
In one example, the method may further comprise the steps of: receiving text and corresponding images related to the target story submitted by the client; and learning to obtain a text generation image model according to the text and the corresponding image related to the target story, and generating the target image. By adopting the processing mode, the processing of the custom role, the custom scene and the custom style on the target story can be supported, so that the accuracy of text mapping can be effectively improved, and the use experience and the use flexibility can be improved.
In one example, the method may further comprise the steps of: receiving a new text and a corresponding image related to the target story submitted by the client; and adjusting the text to generate an image model according to the newly added text and the corresponding image. By adopting the processing mode, the story materials, such as character design pictures of newly added characters, can be newly added in the story mapping process, so that the flexibility and convenience are further improved, and the actual use habit of a user is met.
As can be seen from the above embodiments, in the story mapping method provided in the embodiments of the present application, for any content text in a sequence of content texts that constitute a story, a target image corresponding to the target content text is generated based on the target content text and its context content text, and a context image generated based on the context content text. Because the correlation degree between the target image and the target content text is considered, the correlation degree between the target image and the context content text is also considered, and the semantic consistency between the target image and the context image corresponding to the context content text is considered, the content continuity and plot consistency of the story-matching diagram can be effectively improved.
Eighth embodiment
In the above embodiment, a story mapping method is provided, and correspondingly, the application also provides a story mapping method for a client. The method corresponds to the embodiment of the method described above. Since this method embodiment is substantially similar to method embodiment one, the description is relatively simple, and reference is made to the description of method embodiments in part. The method embodiments described below are merely illustrative.
The application additionally provides a story mapping method comprising:
step 1: a sequence of content text for the target story is obtained.
Step 2: and sending the content text sequence to a server.
The server acquires first characteristic data of the content text; acquiring at least one context content text of the content text and a context image corresponding to the context content text; acquiring second characteristic data corresponding to the context text according to the context text and the corresponding context image; generating an image corresponding to the content text according to the first characteristic data and at least one second characteristic data corresponding to the at least one contextual content text;
step 3: and displaying the image sequence which is returned by the server and corresponds to the content text sequence.
Ninth embodiment
In the above embodiment, a text mapping method is provided, and correspondingly, the application also provides a commodity live broadcast method for the server. The method corresponds to the embodiment of the method described above. Since this method embodiment is substantially similar to method embodiment one, the description is relatively simple, and reference is made to the description of method embodiments in part. The method embodiments described below are merely illustrative.
The application additionally provides a commodity live broadcast method, which comprises the following steps:
step 1: and receiving the description content sequence of the target commodity submitted by the client.
Clients include, but are not limited to: and terminal equipment such as personal computers, smart phones, tablet computers and the like. The method comprises the steps that a client obtains a description content sequence of a target commodity; and sending the description content sequence to a server. The server performs the following steps 2 to 5 for each description in the description sequence.
Step 2: first characteristic data of the descriptive content is acquired.
Step 3: and acquiring at least one context descriptive content of the descriptive content and a context image corresponding to the context descriptive content.
Step 4: and acquiring second characteristic data corresponding to the context description content according to the context description content and the corresponding context image.
Step 5: and generating an image corresponding to the descriptive content according to the first characteristic data and at least one second characteristic data corresponding to the at least one contextual descriptive content.
Step 6: and publishing the image sequence corresponding to the description content sequence to a live broadcast platform.
The server side distributes the image sequence corresponding to the description content sequence to the live broadcast platform, and the user side displays the image sequence through the live broadcast platform, so that the user can know the commodity more conveniently, and commodity success is promoted.
In one example, the method may further comprise the steps of: receiving a text and a corresponding image related to the target commodity submitted by the client; and learning to obtain a text generation image model according to the text and the corresponding image related to the target commodity, and generating the target image. By adopting the processing mode, the processing of the custom roles, the custom scenes and the custom styles on the target commodity can be supported, so that the accuracy of commodity allocation can be effectively improved, and the use experience and the use flexibility can be improved.
In one example, the method may further comprise the steps of: receiving a new text and a corresponding image related to the target commodity submitted by the client; and adjusting the text to generate an image model according to the newly added text and the corresponding image. By adopting the processing mode, commodity materials can be newly added in the commodity mapping process, so that the flexibility and convenience are further improved, and the practical use habit of a user is met.
As can be seen from the above embodiments, in the commodity live broadcast method provided in the embodiments of the present application, for any description content in a description content sequence that constitutes a commodity, an image corresponding to the description content is generated based on the description content and its contextual description content, and a contextual image generated based on the contextual description content. Because the correlation between the image and the corresponding descriptive content is considered, the correlation between the image and the context descriptive content is considered, and the semantic consistency between the image and the context image corresponding to the context descriptive content is considered, the content continuity and the plot consistency of the commodity distribution diagram can be effectively improved, and the commodity live broadcast effect is improved.
Tenth embodiment
In the foregoing embodiment, a commodity live broadcast method is provided, and correspondingly, the present application further provides a commodity live broadcast method for a client. The method corresponds to the embodiment of the method described above. Since this method embodiment is substantially similar to method embodiment one, the description is relatively simple, and reference is made to the description of method embodiments in part. The method embodiments described below are merely illustrative.
The application additionally provides a commodity live broadcast method, which comprises the following steps:
step 1: a descriptive content sequence of the target commodity is obtained.
Step 2: the description content sequence is sent to a server side, so that the server side obtains first characteristic data of the description content; acquiring at least one context descriptive content of the descriptive content and a context image corresponding to the context descriptive content; acquiring second characteristic data corresponding to the context description content according to the context description content and the corresponding context image; generating an image corresponding to the description content according to the first characteristic data and at least one second characteristic data corresponding to the at least one contextual description content; and publishing the image sequence corresponding to the description content sequence to a live broadcast platform.
Eleventh embodiment
In the above embodiment, a text mapping method is provided, and correspondingly, the application also provides a commodity publishing method for the server. The method corresponds to the embodiment of the method described above. Since this method embodiment is substantially similar to method embodiment one, the description is relatively simple, and reference is made to the description of method embodiments in part. The method embodiments described below are merely illustrative.
The application additionally provides a commodity release method, which comprises the following steps:
step 1: and receiving the description content sequence of the target commodity submitted by the client.
Clients include, but are not limited to: and terminal equipment such as personal computers, smart phones, tablet computers and the like. The method comprises the steps that a client obtains a description content sequence of a target commodity; and sending the description content sequence to a server. The server performs the following steps 2 to 5 for each description in the description sequence.
Step 2: first characteristic data of the descriptive content is acquired.
Step 3: and acquiring at least one context descriptive content of the descriptive content and a context image corresponding to the context descriptive content.
Step 4: and acquiring second characteristic data corresponding to the context description content according to the context description content and the corresponding context image.
Step 5: and generating an image corresponding to the descriptive content according to the first characteristic data and at least one second characteristic data corresponding to the at least one contextual descriptive content.
Step 6: and publishing an image sequence corresponding to the descriptive content sequence to an item detail page of the target item.
And the server publishes the image sequence corresponding to the description content sequence to the commodity detail page of the target commodity so as to facilitate the user to know the commodity more and promote commodity exchange.
In one example, the method may further comprise the steps of: receiving a text and a corresponding image related to the target commodity submitted by the client; and learning to obtain a text generation image model according to the text and the corresponding image related to the target commodity, and generating the target image. By adopting the processing mode, the processing of the custom roles, the custom scenes and the custom styles on the target commodity can be supported, so that the accuracy of commodity allocation can be effectively improved, and the use experience and the use flexibility can be improved.
In one example, the method may further comprise the steps of: receiving a new text and a corresponding image related to the target commodity submitted by the client; and adjusting the text to generate an image model according to the newly added text and the corresponding image. By adopting the processing mode, commodity materials can be newly added in the commodity mapping process, so that the flexibility and convenience are further improved, and the practical use habit of a user is met.
As can be seen from the above embodiments, the commodity distribution method provided in the embodiments of the present application generates, for any description content in a description content sequence constituting a commodity, an image corresponding to the description content based on the description content and its contextual description content, and a contextual image generated based on the contextual description content. Because the correlation between the image and the corresponding descriptive content is considered, the correlation between the image and the context descriptive content is considered, and the semantic consistency between the image and the context image corresponding to the context descriptive content is considered, the content continuity and plot consistency of the commodity distribution diagram can be effectively improved, and the richness of commodity distribution content is improved.
Twelfth embodiment
In the above embodiment, a commodity publishing method is provided, and correspondingly, the application also provides a commodity publishing method for the client. The method corresponds to the embodiment of the method described above. Since this method embodiment is substantially similar to method embodiment one, the description is relatively simple, and reference is made to the description of method embodiments in part. The method embodiments described below are merely illustrative.
The application additionally provides a commodity release method, which comprises the following steps:
step 1: a descriptive content sequence of the target commodity is obtained.
Step 2: the description content sequence is sent to a server side, so that the server side obtains first characteristic data of the description content; acquiring at least one context descriptive content of the descriptive content and a context image corresponding to the context descriptive content; acquiring second characteristic data corresponding to the context description content according to the context description content and the corresponding context image; generating an image corresponding to the description content according to the first characteristic data and at least one second characteristic data corresponding to the at least one contextual description content; and publishing an image sequence corresponding to the descriptive content sequence to an item detail page of the target item.
It should be noted that, in the embodiments of the present application, the use of user data may be involved, and in practical applications, user specific personal data may be used in the schemes described herein within the scope allowed by applicable legal regulations in the country where the applicable legal regulations are met (for example, the user explicitly agrees to the user to actually notify the user, etc.).
While the preferred embodiment has been described, it is not intended to limit the invention thereto, and any person skilled in the art may make variations and modifications without departing from the spirit and scope of the present invention, so that the scope of the present invention shall be defined by the claims of the present application.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory, random Access Memory (RAM), and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (f.ash RAM), among other forms in computer readable media. Memory is an example of computer-readable media.
1. Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include non-transitory computer-readable media (trans itory med i a), such as modulated data signals and carrier waves.
2. It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Claims (22)

1. A text mapping method, comprising:
acquiring a content text sequence of a target object;
acquiring first characteristic data of a target content text in the text sequence;
acquiring at least one context text of the target content text and a context image corresponding to the context text;
acquiring second characteristic data corresponding to the context text according to the context text and the corresponding context image;
and generating a target image corresponding to the target content text according to the first characteristic data and at least one second characteristic data corresponding to the at least one context content text.
2. The method of claim 1, wherein generating the target image corresponding to the target content text from the first feature data and at least one second feature data corresponding to the at least one contextual content text comprises:
acquiring a noise image;
and removing noise from the noise image according to the first characteristic data and the at least one second characteristic data through a diffusion model, and taking the image after noise removal as the target image.
3. The method of claim 2, wherein the step of determining the position of the substrate comprises,
the method further comprises the steps of:
extracting a first feature map of the noise image;
the removing noise from the noise image according to the first feature data and the at least one second feature data through a diffusion model, and taking the image after removing noise as the target image, includes:
removing noise from the first feature image according to the first feature data and the at least one second feature data through a diffusion model, and taking the image after noise removal as a second feature image;
and up-sampling the second characteristic diagram, and taking the up-sampled image as the target image.
4. The method of claim 1, wherein the obtaining the first feature data of the target content text in the text sequence comprises:
performing word vector and word position embedding processing on the target content text to form third characteristic data of the target content text;
extracting fourth feature data of the target content text from the third feature data;
and acquiring the first characteristic data according to the fourth characteristic data and the target text type information.
5. The method according to claim 1, wherein the obtaining second feature data corresponding to the context text according to the context text and the corresponding context image includes:
and carrying out multi-mode joint coding on the context text and the corresponding context image to form second characteristic data of graphic fusion.
6. The method of claim 5, wherein the multi-modal joint encoding of the context text and the corresponding context image to form the second feature data of the graphic fusion comprises:
performing word vector and word position embedding processing on the context text to form fifth characteristic data of the context text;
Dividing the context image into a plurality of sub-images;
acquiring sixth characteristic data of the context image according to the characteristic data of the plurality of subgraphs and subgraph position information;
acquiring seventh feature data of the context content text according to the fifth feature data and the sixth feature data;
and acquiring the second characteristic data according to the seventh characteristic data, the context text type information and the context text serial number information.
7. The method of claim 1, wherein the step of determining the position of the substrate comprises,
building a text-generated image model, the model comprising: a condition information encoding network and an image generating network;
the conditional information encoding network includes: a first feature data acquisition network and at least one second feature data acquisition network;
the first characteristic data acquisition network is used for acquiring the first characteristic data according to the target content text;
the second characteristic data acquisition network is used for carrying out multi-mode joint coding on the context content text and the context image to form second characteristic data;
the image generation network is used for generating the target image according to the first characteristic data and the at least one second characteristic data.
8. The method of claim 7, wherein said constructing text to generate an image model comprises:
acquiring a text and a corresponding image which are irrelevant to the target object, and forming a first training sample set;
learning from the first training sample set to obtain a text to generate an image model;
acquiring a text and a corresponding image related to the target object to form a second training sample set;
and adjusting parameters of the text generated image model according to the second training sample set.
9. The method of claim 8, wherein the step of determining the position of the first electrode is performed,
the acquiring the text and the corresponding image related to the target object comprises the following steps:
acquiring at least one role description information, at least one scene description information and/or at least one picture style description information;
generating an image model through a text obtained by learning in the first training sample set, and generating at least one character image design picture according to at least one character description information; generating at least one scene design picture according to the at least one scene description information; and/or generating at least one picture style design picture according to the at least one picture style description information;
Taking the character description information and the character image design picture as a second training sample; taking the scene description information and the scene design picture as a second training sample; and/or taking the picture style description information and the picture style design picture as a second training sample.
10. The method as recited in claim 8, further comprising:
acquiring a new text and a corresponding image related to the target object to form a third training sample set;
and adjusting parameters of the text generation image model according to the third training sample set, wherein the model is used for generating a text map for the text to be processed of the target object.
11. The method of claim 10, wherein the step of determining the position of the first electrode is performed,
the obtaining the new text and the corresponding image related to the target object comprises the following steps:
acquiring at least one newly added character description information, at least one newly added scene description information and/or at least one newly added picture style description information;
generating an image model through the text learned from the second training sample set, and generating at least one newly-added character image design picture according to at least one newly-added character description information; generating at least one newly added scene design picture according to the at least one newly added scene description information; and/or generating at least one newly added picture style design picture according to the at least one newly added picture style description information;
Taking the newly added character description information and the newly added character image design picture as a third training sample; taking the newly added scene description information and the newly added scene design picture as a third training sample; and/or taking the newly added picture style description information and the newly added picture style design picture as a third training sample.
12. A text-to-graphics apparatus, comprising:
a text sequence obtaining unit for obtaining a content text sequence of the target object;
a first feature data acquisition unit, configured to acquire first feature data of a target content text in the text sequence;
a context data obtaining unit, configured to obtain at least one context text of the target content text and a context image corresponding to the context text;
a second feature data obtaining unit, configured to obtain second feature data corresponding to the context text according to the context text and the corresponding context image;
and the image generation unit is used for generating a target image corresponding to the target content text according to the first characteristic data and at least one second characteristic data corresponding to the at least one context content text.
13. A text-generating image model processing method, characterized by comprising:
acquiring a text and a corresponding image irrelevant to a target object to form a first training sample set;
learning from the first training sample set to obtain a text generation image model, wherein the model comprises a condition information coding network and an image generation network;
acquiring a text and a corresponding image related to the target object to form a second training sample set;
according to the second training sample set, adjusting parameters of the model;
wherein the condition information encoding network comprises: a first feature data acquisition network and at least one second feature data acquisition network; the first characteristic data acquisition network is used for acquiring first characteristic data of a target text according to the target text; the second feature data acquisition network is used for acquiring second feature data corresponding to the context text according to the context text of the target text and the context image corresponding to the context text; the image generation network is used for generating a target image corresponding to the target text according to the first characteristic data and at least one second characteristic data corresponding to at least one context text.
14. A text-generating image model processing apparatus, comprising:
the first training sample acquisition unit is used for acquiring texts and corresponding images irrelevant to the target object to form a first training sample set;
the first training unit is used for learning from the first training sample set to obtain a text generation image model, and the model comprises a condition information coding network and an image generation network;
the second training sample acquisition unit is used for acquiring texts and corresponding images related to the target object to form a second training sample set;
the second training unit is used for adjusting parameters of the model according to the second training sample set;
wherein the condition information encoding network comprises: a first feature data acquisition network and at least one second feature data acquisition network; the first characteristic data acquisition network is used for acquiring first characteristic data of a target text according to the target text; the second feature data acquisition network is used for acquiring second feature data corresponding to the context text according to the context text of the target text and the context image corresponding to the context text; the image generation network is used for generating a target image corresponding to the target text according to the first characteristic data and at least one second characteristic data corresponding to at least one context text.
15. A story mapping method, comprising:
receiving a content text sequence of a target story submitted by a client;
acquiring first characteristic data of the content text;
acquiring at least one context content text of the content text and a context image corresponding to the context content text;
acquiring second characteristic data corresponding to the context text according to the context text and the corresponding context image;
generating an image corresponding to the content text according to the first characteristic data and at least one second characteristic data corresponding to the at least one contextual content text;
and sending an image sequence corresponding to the content text sequence to the client.
16. The method as recited in claim 15, further comprising:
receiving text and corresponding images related to the target story submitted by the client;
and according to the text and the corresponding image related to the target story, learning to obtain a text generation image model for generating an image corresponding to the content text.
17. The method as recited in claim 15, further comprising:
receiving a new text and a corresponding image related to the target story submitted by the client;
And adjusting the text to generate an image model according to the newly added text and the corresponding image.
18. A story mapping method, comprising:
acquiring a content text sequence of a target story;
the content text sequence is sent to a server side, so that the server side obtains first characteristic data of the content text; acquiring at least one context content text of the content text and a context image corresponding to the context content text; acquiring second characteristic data corresponding to the context text according to the context text and the corresponding context image; generating an image corresponding to the content text according to the first characteristic data and at least one second characteristic data corresponding to the at least one contextual content text;
and displaying the image sequence which is returned by the server and corresponds to the content text sequence.
19. A commodity direct broadcast method, comprising:
receiving a description content sequence of a target commodity submitted by a client;
acquiring first characteristic data of the descriptive content;
acquiring at least one context descriptive content of the descriptive content and a context image corresponding to the context descriptive content;
Acquiring second characteristic data corresponding to the context description content according to the context description content and the corresponding context image;
generating an image corresponding to the description content according to the first characteristic data and at least one second characteristic data corresponding to the at least one contextual description content;
and publishing the image sequence corresponding to the description content sequence to a live broadcast platform.
20. A commodity direct broadcast method, comprising:
acquiring a description content sequence of a target commodity;
the description content sequence is sent to a server side, so that the server side obtains first characteristic data of the description content; acquiring at least one context descriptive content of the descriptive content and a context image corresponding to the context descriptive content; acquiring second characteristic data corresponding to the context description content according to the context description content and the corresponding context image; generating an image corresponding to the description content according to the first characteristic data and at least one second characteristic data corresponding to the at least one contextual description content; and publishing the image sequence corresponding to the description content sequence to a live broadcast platform.
21. A commodity distribution method, comprising:
receiving a description content sequence of a target commodity submitted by a client;
acquiring first characteristic data of the descriptive content;
acquiring at least one context descriptive content of the descriptive content and a context image corresponding to the context descriptive content;
acquiring second characteristic data corresponding to the context description content according to the context description content and the corresponding context image;
generating an image corresponding to the description content according to the first characteristic data and at least one second characteristic data corresponding to the at least one contextual description content;
and publishing an image sequence corresponding to the descriptive content sequence to an item detail page of the target item.
22. A commodity distribution method, comprising:
acquiring a description content sequence of a target commodity;
the description content sequence is sent to a server side, so that the server side obtains first characteristic data of the description content; acquiring at least one context descriptive content of the descriptive content and a context image corresponding to the context descriptive content; acquiring second characteristic data corresponding to the context description content according to the context description content and the corresponding context image; generating an image corresponding to the description content according to the first characteristic data and at least one second characteristic data corresponding to the at least one contextual description content; and publishing an image sequence corresponding to the descriptive content sequence to an item detail page of the target item.
CN202310231486.9A 2023-03-03 2023-03-03 Text mapping method and device Active CN116385597B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310231486.9A CN116385597B (en) 2023-03-03 2023-03-03 Text mapping method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310231486.9A CN116385597B (en) 2023-03-03 2023-03-03 Text mapping method and device

Publications (2)

Publication Number Publication Date
CN116385597A CN116385597A (en) 2023-07-04
CN116385597B true CN116385597B (en) 2024-02-02

Family

ID=86966572

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310231486.9A Active CN116385597B (en) 2023-03-03 2023-03-03 Text mapping method and device

Country Status (1)

Country Link
CN (1) CN116385597B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110909654A (en) * 2019-11-18 2020-03-24 深圳市商汤科技有限公司 Training image generation method and device, electronic equipment and storage medium
CN112818159A (en) * 2021-02-24 2021-05-18 上海交通大学 Image description text generation method based on generation countermeasure network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114140603B (en) * 2021-12-08 2022-11-11 北京百度网讯科技有限公司 Training method of virtual image generation model and virtual image generation method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110909654A (en) * 2019-11-18 2020-03-24 深圳市商汤科技有限公司 Training image generation method and device, electronic equipment and storage medium
CN112818159A (en) * 2021-02-24 2021-05-18 上海交通大学 Image description text generation method based on generation countermeasure network

Also Published As

Publication number Publication date
CN116385597A (en) 2023-07-04

Similar Documents

Publication Publication Date Title
CN110458918B (en) Method and device for outputting information
CN107251006B (en) Gallery of messages with shared interests
WO2018212822A1 (en) Suggested actions for images
CN113377971B (en) Multimedia resource generation method and device, electronic equipment and storage medium
CN109117778B (en) Information processing method, information processing apparatus, server, and storage medium
US20160306505A1 (en) Computer-implemented methods and systems for automatically creating and displaying instant presentations from selected visual content items
CN113348486A (en) Image display with selective motion description
CN103988202A (en) Image attractiveness based indexing and searching
CN105763420B (en) A kind of method and device of automatic information reply
US20170109339A1 (en) Application program activation method, user terminal, and server
US20170116521A1 (en) Tag processing method and device
CN103200224A (en) Method and device and terminal of information sharing
CN107748780B (en) Recovery method and device for file of recycle bin
CN110678861A (en) Image selection suggestions
WO2019085625A1 (en) Emotion picture recommendation method and apparatus
CN110827058A (en) Multimedia promotion resource insertion method, equipment and computer readable medium
CN111259245B (en) Work pushing method, device and storage medium
CN114404960A (en) Cloud game resource data processing method and device, computer equipment and storage medium
EP4080507A1 (en) Method and apparatus for editing object, electronic device and storage medium
CN112235632A (en) Video processing method and device and server
CN112258214A (en) Video delivery method and device and server
CN109116718B (en) Method and device for setting alarm clock
CN116385597B (en) Text mapping method and device
CN111741365B (en) Video composition data processing method, system, device and storage medium
CN112287173A (en) Method and apparatus for generating information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant