CN112995537B - Video construction method and system - Google Patents

Video construction method and system Download PDF

Info

Publication number
CN112995537B
CN112995537B CN202110175132.8A CN202110175132A CN112995537B CN 112995537 B CN112995537 B CN 112995537B CN 202110175132 A CN202110175132 A CN 202110175132A CN 112995537 B CN112995537 B CN 112995537B
Authority
CN
China
Prior art keywords
video
representation
input information
information
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110175132.8A
Other languages
Chinese (zh)
Other versions
CN112995537A (en
Inventor
张旻晋
许达文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Shihaixintu Microelectronics Co ltd
Original Assignee
Chengdu Shihaixintu Microelectronics Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Shihaixintu Microelectronics Co ltd filed Critical Chengdu Shihaixintu Microelectronics Co ltd
Priority to CN202110175132.8A priority Critical patent/CN112995537B/en
Publication of CN112995537A publication Critical patent/CN112995537A/en
Application granted granted Critical
Publication of CN112995537B publication Critical patent/CN112995537B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Processing Or Creating Images (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a video construction method and a system, wherein the method comprises the following steps: firstly, respectively carrying out feature conversion on multiple kinds of input information describing the same video to obtain feature representation information of each input information; then, sequentially obtaining representation abstract model views and representation videos of all input information, then performing fusion processing on all representation videos to obtain a group of fusion image sets, and finally, taking the harmony fusion image set as construction video output of all input information to construct a smooth video work; the method can realize video generation aiming at different styles and scenes, simultaneously performs fusion and harmonious processing on the generated video, finally constructs smooth video works, and simultaneously has an acceleration function on the parallel operation process, thereby reducing the calculation amount and memory occupation, reducing the workload of edge equipment and enabling the edge equipment to rapidly construct the video.

Description

Video construction method and system
Technical Field
The invention relates to the technical field of video animation, in particular to a video construction method and a video construction system.
Background
The deep learning intelligent perception algorithm enables the electronic equipment to have accurate semantic perception capability, such as text-based semantic recognition, voice information-based semantic recognition and image semantic-based recognition, and provides a good method basis for describing and characterizing environment and intention of the equipment. The method for constructing the video based on the semantic information also obtains good expression effect on the video prediction and generation of the characters, and the function realization of generating the video from the voice, the text and the image also improves the efficiency of the design work of industries such as animation, propagation, education, construction and the like.
The current intelligent algorithm can generate video aiming at human body gestures, expressions, mouth shapes, gestures and scenes according to simple plane composition, and can also generate painting images aiming at special styles aiming at intelligent algorithm training. However, the currently used intelligent methods all predict one type of video, and in real-time applications, multiple types of videos need to be predicted, and meanwhile, fusion processing needs to be performed on videos with different prediction types.
In addition, the video construction method aiming at the current method has huge computation amount, and the running time on the terminal equipment is difficult to meet the requirements of users.
Disclosure of Invention
In order to overcome the defects of the technology, the invention provides a video construction method and a video construction system, the method can realize the generation of videos with styles and scenes aiming at different input types, and simultaneously carry out fusion and harmonious processing on the generated videos to finally construct smooth video works.
The invention is realized by the following technical scheme:
the video construction method provided by the scheme comprises the following steps:
s1, respectively carrying out feature conversion on multiple kinds of input information describing the same video to obtain feature representation information of each input information;
s2, respectively matching the characteristic representation information of each input information with an abstract model library to generate a representation abstract model view based on each input information;
s3, respectively inputting the representation abstract model views of each input information into a video generation algorithm model to generate corresponding representation videos;
s4, performing imaging processing on each representation video respectively to obtain a group of fusion image sets;
and S5, performing harmony processing on the fusion image set to generate a harmony fusion atlas, and outputting the harmony fusion atlas serving as a constructed video of all input information.
The working principle of the scheme is as follows: according to the video construction method provided by the scheme, different description information describing the same characteristic video can be simultaneously used as input information to construct a complete video, the current intelligent method can generate videos according to human postures, expressions, mouth shapes, gestures and scenes according to simple plane composition, and the intelligent method can also be used for training drawing images of special styles to generate videos; however, the existing method can only simultaneously construct videos with the same format, but needs to predict multiple types of videos in real-time application, and needs to perform fusion processing on videos with different prediction types, and the method provided by the scheme can realize simultaneous input of video description information with multiple formats (for example, voice description information is used as first input information, and image description information/image description information and voice description information are used as second input information), obtain representation videos of each input information through synchronous processing on multiple input information, finally perform fusion processing on all representation videos to obtain a fusion image set, and finally obtain a complete video which can contain all the characteristics of the input information; the video construction method not only realizes the video generation aiming at different styles and scenes, and the synchronous processing of a plurality of input information, but also effectively improves the video construction efficiency.
The further optimization scheme is that the type of the input information in the S1 is one or more of hand-drawn abstract pictures, voice, texts and images.
The further optimization scheme is that the feature characterization information in S1 includes: text representation information, semantic representation information and feature map representation information.
The further optimization scheme is that the abstract model library in S2 includes, but is not limited to, abstract model information for a pose, a gesture, a mouth shape, an expression or a scene, and the information representation form includes, but is not limited to, vector data, a coordinate set and point cloud data.
The further optimization scheme is that the imaging processing process of the representation video of the input information in the S4 comprises the following steps:
s41, performing image segmentation processing on each frame image of the representation video of each input information respectively, wherein each frame image obtains a plurality of segmented image blocks;
s42, adding foreground or background marks to each segmented image block, and obtaining a group of Jing Tuji combinations by each input information;
s43, fusing the foreground and the background of each group of scene picture set according to frames to generate a group of fused image sets.
The further optimization scheme is that the method for fusion processing of the foreground and the background comprises but is not limited to: a spatial domain fusion method, a transform domain fusion method, and a neural network-based image fusion method.
The further optimization scheme is that the method for adding the mark to the foreground or the background comprises but is not limited to an image segmentation marking method and an image semantic identification marking method.
The further optimization scheme is that the harmony processing method in the S6 is an image harmony method based on a neural network method.
Based on the video construction method, the invention also provides a video construction system, which comprises the following steps: the device comprises a first feature extraction device, an abstract model matching device, a representation video generation device, an imaging processing device, a fusion device and a calculation device;
the first feature extraction device is used for respectively carrying out feature conversion on the plurality of input information to obtain feature representation information of each input information;
the abstract model matching device is used for respectively matching the characteristic representation information of each input information with an abstract model library to generate a representation abstract model view based on each input information;
the representation video generation device respectively inputs representation abstract model views of each input information into a video generation algorithm model to generate corresponding representation videos;
the imaging processing device respectively carries out imaging processing on each representation video to obtain a scene graph set of each representation video;
the fusion device fuses Jing Tuji representing videos to generate a fusion image set;
the computing device executes a harmony algorithm on the fused image set to generate a harmony fused image set, and the harmony fused image set is used as the constructed video output of all the input information.
The further optimization scheme is that the input information is one or more of hand-drawn abstract pictures, voice, texts and images.
The video construction system also comprises a memory module, a table look-up matching module, a parallel operation acceleration module and a main control processor module;
the memory module comprises a storage medium, is used for transmitting and temporarily storing external input information, and temporarily storing the weight of the intelligent algorithm, the characteristic representation information, the abstract characteristic library, the abstract model view, the representation video, the image segmentation result, the scene graph set, the fusion image set and the constructed video;
the table look-up matching module internally comprises a matching operation unit and an address mapping unit and is used for searching and matching the information in the abstract characteristic library in the memory aiming at the characteristic representation information and outputting an abstract model view aiming at the characteristic representation information;
the parallel operation acceleration module internally comprises at least one parallel operation processing unit and is used for accelerating the execution of harmony processing operation in the information characteristic extraction, video generation operation, image segmentation nursing and harmony image set;
the main control processor module internally comprises a control logic module and a processor module and is used for controlling the memory module, searching the matching module, accelerating data transmission among the modules through parallel operation, executing fusion processing calculation aiming at the foreground and the background and performing non-parallel operation in the video construction method.
Compared with the prior art, the invention has the following advantages and beneficial effects:
the invention provides a video construction method and a video construction system, which can realize video generation aiming at different styles and scenes, simultaneously carry out fusion and harmonious processing on the generated videos, and finally construct smooth video works; the video construction system formed by connecting the provided video construction device and the interactive equipment can construct the video works with the styles and descriptions required by the users according to the input various information.
Drawings
The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention.
FIG. 1 is a flow chart of a video construction method of the present invention;
fig. 2 is a schematic diagram of the video construction system of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to examples and accompanying drawings, and the exemplary embodiments and descriptions thereof are only used for explaining the present invention and are not meant to limit the present invention.
Example 1
As shown in fig. 1, this embodiment provides a video construction method, which includes the following steps:
step 1, performing feature conversion on first input information to acquire first feature representation information aiming at the first input information;
step 2, matching the first representation information obtained in the step 1 with an abstract model library to generate a first representation abstract model view based on first input information;
step 3, the video generation algorithm model executes video generation operation on the first abstract model view generated in the step 2 to generate a first representation video aiming at first input information;
step 4, performing feature conversion on the second input information to acquire second feature representation information aiming at the second input information; matching the second representation information with the abstract model library to generate a second representation abstract model view based on second input information; inputting the second abstract model view into the video generation algorithm model, executing the video generation operation, and generating the second input information
A second characterizing video;
step 5, respectively executing image segmentation processing on each frame of image of the first representation video in the step 3 and each frame of image of the second representation video in the step 4 to obtain an image segmentation result of each frame of each video;
step 6, respectively adding foreground or background marks according to the divided image blocks in each frame of image in the step 5 to obtain a first scene image set and a second scene image set;
step 7, merging the marked first scene image set and a second scene image set Jing Tuji into a frame to perform foreground and background fusion processing to generate a fusion image set;
and 8, performing harmony processing on each frame of image in the fusion image set generated in the step 7 to generate a harmonious fusion image set and obtain a constructed video.
It should be noted that the input information includes, but is not limited to, hand-drawing abstract, voice, text, and image input information;
the feature representation information includes but is not limited to text, semantics, feature map information;
the abstract feature library includes, but is not limited to, abstract information for gestures, mouth shapes, expressions, scenes, and its information representation forms include, but are not limited to, vector data, coordinate sets, and point cloud data.
The input information includes, but is not limited to, first input information, second input information, third input information; generating at least one characterization video;
it should be noted that, the foreground and background marking method in step 6 includes, but is not limited to, an image segmentation marking method and an image semantic identification marking;
it should be noted that, the image fusion method of the foreground and the background in step 7 includes, but is not limited to, a spatial domain fusion method, a transform domain fusion method, and an image fusion method based on a neural network;
it should be noted that the image harmony processing method described in step 8 includes, but is not limited to, a neural network method and an image harmony method.
Example 2
Further explanation is given by taking scene video construction in an animation task as an embodiment.
In this embodiment, a "scene where a character walks in a grassland" in an animation video scene construction is taken as an example, where an initial image and a text description are used as input information, an abstract model library based on a wash ink style is used to generate a confrontation network execution video generation method, an encoding and decoding structure neural network is used to execute image segmentation processing and harmony processing of a background mark and an image, and an image fusion method based on a deep neural network is used to execute fusion processing of each frame of image in a video.
S1, performing characteristic conversion on a character picture drawn by a first original hand and a description text of the picture as input information to acquire characteristic representation information aiming at character information;
s2, matching the character characteristic representation information obtained in the step S1 with an ink and wash style abstract model library to generate a representation abstract model view based on character information;
s3, generating a confrontation network algorithm model, executing video generation operation on the figure representation abstract model view, and generating a first representation video for figure information;
step s4, performing feature conversion on the hand-drawn sketch picture and the text information described by the hand-drawn sketch picture to obtain representation information aiming at the sketch information; matching the grassland representation information with the ink and wash style abstract model library to generate a representation abstract model view based on the grassland information; inputting the grassland abstract model view into a video generation algorithm model, executing video generation operation, and generating a second representation video aiming at the Chinese ink style grassland;
step s5, respectively executing image segmentation processing on each frame of images of the character first representation video in the step s3 and the grassland second representation video in the step s4 to obtain an image segmentation result of each frame of each video;
step s6, respectively adding foreground or background marks according to the divided image blocks in each frame of image in the step s5 to obtain a character scene image set and a grassland scene image set;
step s7, carrying out fusion processing of foreground and background on the marked character scene image set and the grassland scene image set according to frames to generate a fusion image set;
and step s8, performing harmony processing on each frame of image in the fusion image set generated in the step s7 to generate a harmonious fusion image set, and acquiring a task of the ink and wash style to walk on the grassland video.
Example 3
As shown in fig. 2, the video construction system provided in this embodiment includes: the device comprises a first feature extraction device, an abstract model matching device, a representation video generation device, an imaging processing device, a fusion device and a calculation device;
the first feature extraction device is used for respectively carrying out feature conversion on the plurality of input information to obtain feature representation information of each input information;
the abstract model matching device is used for respectively matching the characteristic representation information of each input information with an abstract model library to generate a representation abstract model view based on each input information;
the representation video generation device respectively inputs representation abstract model views of all input information into a video generation algorithm model to generate corresponding representation videos;
the imaging processing device respectively carries out imaging processing on each representation video to obtain a scene graph set of each representation video;
the fusion device fuses Jing Tuji representing videos to generate a fusion image set;
and the computing device executes a harmony algorithm on the fused image set to generate a harmony fused atlas, and the harmony fused atlas is used as a constructed video output of all input information.
The device also comprises a memory module, a table look-up matching module, a parallel operation acceleration module and a main control processor module;
the memory module comprises a storage medium, is used for transmitting and temporarily storing external input information, and temporarily storing the weight of the intelligent algorithm, the characteristic representation information, the abstract characteristic library, the abstract model view, the representation video, the image segmentation result, the scene graph set, the fusion image set and the constructed video;
the table look-up matching module internally comprises a matching operation unit and an address mapping unit and is used for searching and matching the information in the abstract characteristic library in the memory aiming at the characteristic representation information and outputting an abstract model view aiming at the characteristic representation information;
the parallel operation acceleration module internally comprises at least one parallel operation processing unit and is used for accelerating the execution of harmony processing operation in the information characteristic extraction, video generation operation, image segmentation nursing and harmony image set;
the main control processor module internally comprises a control logic module and a processor module and is used for controlling the memory module, searching the matching module, accelerating data transmission among the modules through parallel operation, executing fusion processing calculation aiming at the foreground and the background and performing non-parallel operation in the video construction method.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (9)

1. A video construction method, comprising the steps of: s1, respectively carrying out feature conversion on multiple kinds of input information describing the same video to obtain feature representation information of each input information; s2, respectively matching the characteristic representation information of each input information with an abstract model library to generate a representation abstract model view based on each input information; s3, respectively inputting representation abstract model views of all input information into a video generation algorithm model to generate corresponding representation videos; s4, performing imaging processing on each representation video respectively to obtain a group of fusion image sets; the imaging processing process of the representation video of the input information in the S4 comprises the following steps: s41, performing image segmentation processing on each frame image of the representation video of each input information respectively, wherein each frame image obtains a plurality of segmented image blocks; s42, adding foreground or background marks to each segmented image block, and obtaining a group of Jing Tuji combinations by each input information; s43, fusing the foreground and the background of each group of scene picture set according to frames to generate a group of fused image sets; and S5, performing harmony processing on the fusion image set to generate a harmony fusion atlas, and taking the harmony fusion atlas as the constructed video output of all input information.
2. The video construction method according to claim 1, wherein the input information in S1 is one or more of hand-drawn abstract, voice, text and image.
3. The video construction method according to claim 1, wherein the feature characterization information in S1 includes: text representation information, semantic representation information and feature map representation information.
4. The method of claim 1, wherein the abstract model library in S2 includes but is not limited to abstract model information for pose, gesture, mouth, expression or scene, and its information representation forms include but are not limited to vector data, coordinate set, point cloud data.
5. The video construction method according to claim 1, wherein the foreground and background fusion processing method includes but is not limited to: a spatial domain fusion method, a transform domain fusion method, and a neural network-based image fusion method.
6. The video construction method according to claim 1, wherein the method for adding a mark to the foreground or the background includes, but is not limited to, an image segmentation mark method and an image semantic identification mark method.
7. The video construction method according to claim 1, wherein the harmony processing method in S6 is an image harmony method based on a neural network method.
8. A video construction system for use in the video construction method of any one of claims 1 to 7, comprising: the device comprises a first feature extraction device, an abstract model matching device, a representation video generation device, an imaging processing device, a fusion device and a calculation device; the first feature extraction device is used for respectively carrying out feature conversion on the plurality of input information to obtain feature representation information of each input information; the abstract model matching device is used for respectively matching the characteristic representation information of each input information with an abstract model library to generate a representation abstract model view based on each input information; the representation video generation device respectively inputs representation abstract model views of all input information into a video generation algorithm model to generate corresponding representation videos; the imaging processing device respectively carries out imaging processing on each representation video to obtain a scene graph set of each representation video; the fusion device fuses Jing Tuji representing videos to generate a fusion image set; and the computing device executes a harmony algorithm on the fused image set to generate a harmony fused atlas, and the harmony fused atlas is used as a constructed video output of all input information.
9. A video construction system according to claim 8, wherein the input information is one or more of hand-drawn abstract, speech, text, image.
CN202110175132.8A 2021-02-09 2021-02-09 Video construction method and system Active CN112995537B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110175132.8A CN112995537B (en) 2021-02-09 2021-02-09 Video construction method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110175132.8A CN112995537B (en) 2021-02-09 2021-02-09 Video construction method and system

Publications (2)

Publication Number Publication Date
CN112995537A CN112995537A (en) 2021-06-18
CN112995537B true CN112995537B (en) 2023-02-24

Family

ID=76347959

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110175132.8A Active CN112995537B (en) 2021-02-09 2021-02-09 Video construction method and system

Country Status (1)

Country Link
CN (1) CN112995537B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108986186A (en) * 2018-08-14 2018-12-11 山东师范大学 The method and system of text conversion video
CN109522908A (en) * 2018-11-16 2019-03-26 董静 Image significance detection method based on area label fusion
CN110717054A (en) * 2019-09-16 2020-01-21 清华大学 Method and system for generating video by crossing modal characters based on dual learning
CN111669515A (en) * 2020-05-30 2020-09-15 华为技术有限公司 Video generation method and related device

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7512537B2 (en) * 2005-03-22 2009-03-31 Microsoft Corporation NLP tool to dynamically create movies/animated scenes
US20120130717A1 (en) * 2010-11-19 2012-05-24 Microsoft Corporation Real-time Animation for an Expressive Avatar
EP3161829B1 (en) * 2014-06-30 2019-12-04 Mario Amura Audio/video editing device, movie production method starting from still images and audio tracks and associated computer program
JP6546611B2 (en) * 2017-02-03 2019-07-17 日本電信電話株式会社 Image processing apparatus, image processing method and image processing program
CN107948519B (en) * 2017-11-30 2020-03-27 Oppo广东移动通信有限公司 Image processing method, device and equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108986186A (en) * 2018-08-14 2018-12-11 山东师范大学 The method and system of text conversion video
CN109522908A (en) * 2018-11-16 2019-03-26 董静 Image significance detection method based on area label fusion
CN110717054A (en) * 2019-09-16 2020-01-21 清华大学 Method and system for generating video by crossing modal characters based on dual learning
CN111669515A (en) * 2020-05-30 2020-09-15 华为技术有限公司 Video generation method and related device

Also Published As

Publication number Publication date
CN112995537A (en) 2021-06-18

Similar Documents

Publication Publication Date Title
US11783491B2 (en) Object tracking method and apparatus, storage medium, and electronic device
CN110021051B (en) Human image generation method based on generation of confrontation network through text guidance
CN109086683B (en) Human hand posture regression method and system based on point cloud semantic enhancement
CN114550177B (en) Image processing method, text recognition method and device
JP2023541532A (en) Text detection model training method and apparatus, text detection method and apparatus, electronic equipment, storage medium, and computer program
CN111709497B (en) Information processing method and device and computer readable storage medium
CN113704531A (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN109712108B (en) Visual positioning method for generating network based on diversity discrimination candidate frame
Ren et al. Visual semantic segmentation based on few/zero-shot learning: An overview
CN113761153A (en) Question and answer processing method and device based on picture, readable medium and electronic equipment
CN115393872B (en) Method, device and equipment for training text classification model and storage medium
CN117078790B (en) Image generation method, device, computer equipment and storage medium
CN110427864B (en) Image processing method and device and electronic equipment
CN114610677B (en) Determination method and related device of conversion model
CN117218246A (en) Training method and device for image generation model, electronic equipment and storage medium
CN117011875A (en) Method, device, equipment, medium and program product for generating multimedia page
US20230072445A1 (en) Self-supervised video representation learning by exploring spatiotemporal continuity
CN114241167A (en) Template-free virtual clothes changing method and device from video to video
WO2024066549A1 (en) Data processing method and related device
CN112995537B (en) Video construction method and system
CN116939288A (en) Video generation method and device and computer equipment
CN111382301A (en) Three-dimensional model generation method and system based on generation countermeasure network
CN116468886A (en) Scene sketch semantic segmentation method and device based on strokes
CN115775300A (en) Reconstruction method of human body model, training method and device of human body reconstruction model
CN113052156B (en) Optical character recognition method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant