CN110968736B - Video generation method and device, electronic equipment and storage medium - Google Patents

Video generation method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN110968736B
CN110968736B CN201911228480.6A CN201911228480A CN110968736B CN 110968736 B CN110968736 B CN 110968736B CN 201911228480 A CN201911228480 A CN 201911228480A CN 110968736 B CN110968736 B CN 110968736B
Authority
CN
China
Prior art keywords
video
scene
information
text
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911228480.6A
Other languages
Chinese (zh)
Other versions
CN110968736A (en
Inventor
刘炫鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Zhuiyi Technology Co Ltd
Original Assignee
Shenzhen Zhuiyi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Zhuiyi Technology Co Ltd filed Critical Shenzhen Zhuiyi Technology Co Ltd
Priority to CN201911228480.6A priority Critical patent/CN110968736B/en
Publication of CN110968736A publication Critical patent/CN110968736A/en
Priority to PCT/CN2020/116452 priority patent/WO2021109678A1/en
Application granted granted Critical
Publication of CN110968736B publication Critical patent/CN110968736B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/7867Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7837Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
    • G06F16/784Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content the detected or recognised objects being people
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7844Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Library & Information Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The application discloses a video generation method and device, electronic equipment and a storage medium. The method comprises the following steps: acquiring interactive information input by a user; acquiring a scene video according to the interactive information, wherein the scene video comprises a character to be matched; acquiring face information of a user and extracting corresponding face features as target face features; replacing the facial features of the characters to be matched in the scene video with the target facial features to generate a video to be played; and outputting the video to be played. Therefore, the information is flexibly displayed to the user in a video mode, the facial features of specific people in the video are replaced by the target facial features, the substitution feeling of the user is enhanced, and the use experience of the user for obtaining the information is improved.

Description

Video generation method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of electronic device technologies, and in particular, to a video generation method and apparatus, an electronic device, and a storage medium.
Background
With the development of science and technology, people have increasingly rich lives, and more ways are provided for people to acquire information in texts, so that the people are more and more convenient. This can now also be done by audio means, as opposed to the information in text which could previously only be read.
However, the user can conveniently acquire the text information without watching the text in an audio mode, but the text information is boring and uninteresting, and the user is difficult to know the specific information of the text content and the environment scene, so that the experience of acquiring the information by the user is reduced.
Disclosure of Invention
The application provides a video generation method, a video generation device, electronic equipment and a storage medium, information is displayed to a user in a video mode, the user can acquire the information from the sense of hearing and the sense of vision simultaneously, the information can be more vividly displayed, the face of the user is reproduced in the video, the substituted sense of the user to the information is enhanced, and the experience sense of the user for acquiring the information is improved.
In a first aspect, an embodiment of the present application provides a video generation method, where the method includes: acquiring interactive information input by a user; acquiring a scene video according to the interaction information, wherein the scene video comprises characters to be matched; acquiring face information of a user and extracting corresponding face features as target features; replacing the facial features of the characters to be matched in the scene video with the target facial features to generate a video to be played; and outputting the video to be played.
Optionally, the obtaining the scene video according to the interaction information includes: performing semantic understanding on the interactive information to acquire semantic information of the interactive information; searching related video text information according to the semantic information; and generating a scene video according to the video text information.
Optionally, generating a scene video according to the video text information includes: cutting the video text information according to scenes to obtain at least one section of scene text; performing semantic understanding on the at least one section of scene text, and respectively generating a sub-scene video corresponding to each section of scene text; if a sub-scene video is generated, taking the sub-scene video as the scene video; and if a plurality of sub-scene videos are generated, synthesizing the plurality of sub-scene videos into the scene video.
Optionally, performing semantic understanding on the at least one point scene text, and respectively generating a sub-scene video corresponding to each segment of scene text, including: extracting semantic features from the video text information, wherein the semantic features comprise people, places and events; converting the video text information into voice information; and generating a sub-scene video for executing the event at the place by the person according to the semantic features and the voice information.
Optionally, obtaining a scene video according to the interaction information includes: performing semantic understanding on the interactive information to acquire semantic information of the interactive information; and searching related video files according to the semantic information to serve as the scene video.
Optionally, replacing the facial features of the people to be matched in the scene video with the target facial features to obtain a video to be played, where the method includes: performing semantic understanding on the scene video, acquiring a principal role of the whole scene video, and taking the principal role as a character to be matched in the scene video; and replacing the facial features of the people to be matched with the target facial features.
Optionally, replacing the facial features of the people to be matched in the scene video with the target facial features to obtain a video to be played, where the method includes:
displaying all the characters in the scene video to instruct a user to select a designated character from the all the characters; acquiring an appointed figure selected by a user, and taking the appointed figure as a figure to be matched in the scene video; and replacing the facial features of the people to be matched with the target facial features.
In a second aspect, an embodiment of the present application provides a video generating apparatus, including: the information input module is used for acquiring interactive information input by a user; the scene video acquisition module is used for acquiring a scene video according to the interaction information, wherein the scene video comprises a character to be matched; the face acquisition module is used for acquiring face information of a user and extracting corresponding face features as target face features; the video generation module is used for replacing the facial features of the characters to be matched in the scene video with the target facial features to generate a video to be played; and the output module is used for outputting the video to be played.
Optionally, the scene video obtaining module further includes: the understanding unit is used for carrying out semantic understanding on the interactive information and acquiring semantic information of the interactive information; the video generating unit is used for searching related video text information according to the semantic information; and generating a scene video according to the video text information.
Optionally, the video generating unit further includes: the cutting subunit is used for cutting the video text information according to scenes to obtain at least one segment of scene text; the generating subunit is used for performing semantic understanding on the at least one section of scene text and respectively generating a sub-scene video corresponding to each section of scene text; the synthesizing subunit is used for taking one sub-scene video as the scene video if the sub-scene video is generated; and if a plurality of sub-scene videos are generated, synthesizing the plurality of sub-scene videos into the scene video.
Optionally, the generating subunit is further configured to extract semantic features from the scene text, where the semantic features include people, places, and events; converting the scene text into voice information; and generating a sub-scene video for executing the event at the place by the person according to the semantic features and the voice information.
Optionally, the scene video obtaining module is further configured to perform semantic understanding on the interaction information, and obtain semantic information of the interaction information; and searching related video files according to the semantic information to serve as the scene video.
Optionally, the video generating module further includes: the determining unit is used for performing semantic understanding on the scene video, acquiring a pivot of the whole scene video, and taking the pivot as a character to be matched in the scene video; and the replacing unit is used for replacing the facial features of the person to be matched with the target facial features.
Optionally, the video generating module further includes: the display unit is used for displaying all the characters in the scene video so as to instruct a user to select a specified character from all the characters; acquiring an appointed figure selected by a user, and taking the appointed figure as a figure to be matched in the scene video; and the replacing unit is used for replacing the facial features of the person to be matched with the target facial features.
In a third aspect, an embodiment of the present application provides an electronic device, which includes one or more processors; a memory electrically connected with the one or more processors; one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the method as applied to an electronic device, as described above.
In a fourth aspect, the present application provides a computer-readable storage medium having a program code stored therein, wherein the program code performs the above method when running.
These and other aspects of the present application will be more readily apparent from the following description of the embodiments.
According to the video generation method, the video generation device, the electronic equipment and the storage medium, interactive information input by a user is acquired; acquiring a scene video according to the interactive information, wherein the scene video comprises a character to be matched; acquiring face information of a user and extracting corresponding face features as target face features; replacing the facial features of the characters to be matched in the scene video with the target facial features to generate a video to be played; and outputting the video to be played. Therefore, the information is flexibly displayed to the user in a video mode, the facial features of specific people in the video are replaced by the target facial features, the substitution feeling of the user is enhanced, and the use experience of the user for obtaining the information is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 shows a flowchart of a video generation method according to an embodiment of the present application.
Fig. 2 is a schematic diagram illustrating replacement of facial features of a person to be matched according to an embodiment of the present application.
Fig. 3 shows a flowchart of a video generation method according to another embodiment of the present application.
Fig. 4 shows a flowchart of step S240 in a video generation method according to an embodiment of the present application.
Fig. 5 shows a flowchart of a video generation method according to another embodiment of the present application.
Fig. 6 shows a flowchart of a video generation method according to still another embodiment of the present application.
Fig. 7 is a functional block diagram of a video generating apparatus according to an embodiment of the present application.
Fig. 8 shows a block diagram of an electronic device for executing a video generation method according to an embodiment of the present application.
Fig. 9 illustrates a storage medium provided in an embodiment of the present application and used for storing or carrying program codes for implementing a video generation method according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.
With the development of society, technology, people can acquire information and knowledge in various ways, for example, reading text, listening to audio or watching video. However, the way of reading text or listening to audio is monotonous, and users usually feel boring when reading text or listening to audio for a long time, thereby resulting in poor user experience. The video has a better expression mode, and can provide information for the user through sound and pictures, however, as the person in the picture is not the user, the generated substitution feeling is weak, and the user experience is poor.
The inventor finds in research that when information is acquired through a video, the face of a user can be reproduced on a certain person in the video, so that the sense of substitution of the user is enhanced, the information in the video is acquired better, and the experience of the user is enhanced.
Thus, the inventor proposes a video generation method, a video generation device, an electronic device, and a storage medium in the embodiments of the present application. The method and the device have the advantages that the information content is displayed through the video, meanwhile, the face of a user is reproduced on a certain person of the video, so that the substituted feeling of the user is enhanced, and the user experience is improved.
The following will describe embodiments of the present application in detail.
Referring to fig. 1, an embodiment of the present application provides a video generation method, which can be applied to an electronic device. The electronic device may be various electronic devices having a display screen, a camera, an audio output function, and data input, including but not limited to a smart phone, a tablet computer, a laptop portable computer, a desktop computer, a wearable electronic device, and the like. Specifically, the data input may be inputting voice based on a voice module provided on the electronic device, inputting characters based on a character input module, and the specific method may include:
step S110: and acquiring the interactive information input by the user.
In this embodiment, the interaction information input by the user may be acquired through a plurality of information input modules integrated in the electronic device or a plurality of information input devices connected to the electronic device.
In some embodiments, information is exchanged
Including but not limited to voice information, text information, image information, motion information, and the like. The voice information may include audio information of a voice class, such as chinese, english audio, etc., and audio information of a non-language class, such as music audio, etc.; the text information may include text information of a character class such as chinese, english, etc., and text information of a non-character class such as a special symbol, a character expression, etc.; the image information may include still image information such as still pictures, photographs, and the like, as well as moving image information such as moving pictures, video images, and the like; the motion information may include user motion information such as user gestures, body motions, expression motions, etc., and terminal motion information such as position, posture and motion state of the terminal device such as shaking, rotating, etc.
It can be understood that information collection can be performed through different types of information input modules on the terminal device corresponding to different types of interaction information. For example, voice information of a user may be collected through an audio input device such as a microphone, text information input by the user may be collected through a touch screen or a physical key, image information may be collected through a camera, and motion information may be collected through an optical sensor, a gravity sensor, or the like.
Different types of interaction information can be corresponded to the same request. For example, when a user wants to input a request of "i want to listen to the story of alatin", the user may input corresponding audio by means of voice input, and may upload a picture related to alatin or input corresponding text information. It can be understood that only one type of interaction information can be input corresponding to the same request, and multiple types of interaction information can also be input simultaneously, so that the intention of the user is more clear and is easier to be electronically recognized.
In the embodiment, different types of interaction information are acquired through multiple modes, so that multiple interaction modes of a user can be responded freely without being limited to a traditional mechanical man-machine interaction means, multi-mode interaction between man-machines is realized, and more interaction scenes are met.
Step S120: and acquiring a scene video according to the interactive information, wherein the scene video comprises the character to be matched.
After the interactive information input by the user is obtained, semantic understanding can be performed on the interactive information, and the semantic information of the interactive information is obtained, so that the interactive information of the user can be accurately understood.
The scene video may be video information related to interaction information input by the user and acquired by the electronic device for the interaction information.
As an embodiment, a video related to the semantic information may be searched according to the semantic information. For example, the interactive information input by the user is "story that i want to listen to alatin", and the scene video corresponding to the interactive information may be a movie and television work corresponding to alatin, and the like.
As another embodiment, video text information related to the semantic information may be searched according to the semantic information. For example, if the interactive information input by the user is 'i want to listen to the story of the alatin', the story text related to the alatin is searched, and the corresponding scene video is generated according to the story text.
Specifically, the obtained video text information may be cut according to a scene to obtain a plurality of segments of scene texts, semantic understanding may be performed based on each segment of scene text to obtain characters, places and events in each segment of scene text, and the scene text may be converted into voice information. If a sub-scene video is generated, taking the sub-scene video as a scene video; and if a plurality of sub-scene videos are generated, splicing the sub-scene videos to synthesize the scene video.
Step S130: the face information of the user is obtained and corresponding face features are extracted to serve as target face features.
The face information of the user is obtained, and the face features are extracted according to the face information of the user. The face information may be a face image or a video including a face. In the embodiment of the application, the face features may be a feature point set used for describing all or part of the morphology of the face, the feature point set is recorded with position information and depth information of each feature point on the face in the space, and a local or all image of the face can be reconstructed by obtaining the face features. In some embodiments, the acquired face image or face video may be input into a feature extraction model to obtain the face features. Therein, it is understood that the facial features may be facial features, such as features of the eyebrows, eyes, nose, mouth, ears.
The face information of the user can be acquired through a face image of the user acquired by a camera device of the electronic equipment, and can also be a face image provided by the user. When the face image is collected through the camera device, the camera device of the electronic device may be started to collect the face image after the electronic device acquires the interactive information input by the user. Extracting face features according to face information, wherein the face features can be extracted from the acquired face image or video at an electronic equipment end to serve as a target face; or the acquired face image or video can be sent to a server through a network and the like, and the face features are extracted by the server to be used as the face features. And defining the target face features as face features extracted according to the acquired face information.
Step S140: and replacing the facial features of the characters to be matched in the scene video with the target facial features to generate a video to be played.
After the scene video corresponding to the interactive information and the target face features are obtained, the target face features can replace the face features of the people to be matched in the scene video to generate a video to be played.
The character to be matched is a character which needs to be replaced in the acquired scene video. In some embodiments, the replacement of facial feature points may be for a user-specified person. In some embodiments, semantic understanding may be performed on the scene video, a hero in the entire scene video is acquired, and facial features of the hero are replaced. And reproducing the target face characteristics on the face of the character to be matched in the scene video to obtain the video to be played.
When the facial features of the characters to be matched in the scene video are replaced, because the scene video can be split into multiple frames of images, each frame of image in the scene video can be processed, and whether the characters to be matched exist in each frame of image is respectively detected; if the person to be matched exists in a certain frame of image, positioning the facial features of the person to be matched to determine a replacement area, and replacing the replacement area with the target facial features. Therefore, in the scene video, the pictures of the characters to be matched exist, the facial features of the characters to be matched are replaced by the target facial features, and other characters and scenes in the scene video can be not processed and original images in the scene video are kept.
When the facial features of the people to be matched are replaced by the target facial features, the facial features of the people to be matched can be positioned to obtain the area to be replaced, and the facial features in the area to be replaced are replaced by the target facial features. Referring to FIG. 2, a schematic diagram of facial feature replacement is shown. 141 is a person to be matched in a scene video, 142 is a replacement area obtained by positioning the facial features of the person to be matched, 143 is the acquired target facial features, and 144 is the person obtained by replacing the facial features of the person to be matched with the target facial features.
Step S150: and outputting the video to be played.
The video to be played is output, the video to be played can be played on electronic equipment, vivid video content is presented to a user by combining sound and picture content, the facial features of the user are reproduced on the person of the video to be played in the video to be played, and the substituting feeling of the user for the video content is improved.
As an implementation manner, after the electronic device acquires the interaction information, the electronic device may locally identify the interaction information, and acquire a scene video according to the interaction information. And acquiring face information, extracting corresponding target face features, and replacing the face features of the people to be matched in the scene video to obtain the video to be played.
As an implementation manner, in a state that the electronic device and the server establish a communication connection, after the electronic device obtains interaction information input by a user, the interaction information may be forwarded to the server, the server obtains a corresponding scene video by performing semantic understanding on the interaction information, the electronic device sends the obtained face information to the server, the server obtains a target face feature by extracting the face feature, replaces the face feature of a person to be matched in the scene video with the target face feature to obtain a video to be played, and sends the video to be played to the electronic device for playing. Thereby the local operation storage pressure of the electronic equipment can be reduced.
It is to be understood that the sequence of step S120 and step S130 is not limited, and step S120 and step S130 may be performed simultaneously after the interactive information is acquired, or step S130 is performed first after the interactive information input by the user is acquired, and the face information of the user is acquired to extract the target face feature, or step S120 is performed first, and the scene video is acquired according to the interactive information. In the actual implementation process, the setting may be performed as needed, and is not specifically limited herein.
The video generation method provided by the embodiment of the application acquires interactive information input by a user; acquiring a scene video according to the interactive information, wherein the scene video comprises a character to be matched; acquiring face information of a user and extracting corresponding face features as target face features; replacing the facial features of the characters to be matched in the scene video with the target facial features to generate a video to be played; and outputting the video to be played. Therefore, the information is flexibly displayed in front of the user through a method of combining voice and pictures, the face of the user is reproduced on the character of the video, interaction is more visual, the substituting feeling of the user to the information is enhanced, and the experience of the user for obtaining the information is improved.
Referring to fig. 3, another embodiment of the present application provides a video generating method, where on the basis of the foregoing embodiment, this embodiment mainly describes a process of generating a scene video according to video text information, and the method may include:
step S210: and acquiring the interactive information input by the user.
In this embodiment, the specific description of step S210 may refer to step S110 in the previous embodiment, and this embodiment is not described again.
Step S220: and performing semantic understanding on the interactive information to acquire semantic information of the interactive information.
In this embodiment, for different types of the interactive information, the interactive information may be input into an identification model corresponding to the type of the interactive information, and the interactive information is identified based on the identification model to obtain corresponding semantic information.
As an implementation manner, when the interactive information input by the user is voice information, the interactive information may be recognized based on a voice recognition model to obtain corresponding semantic information; when the interactive information is text information, corresponding semantic information can be obtained based on the character recognition model; when the interactive information is image information, the interactive information can be identified based on an image identification model to obtain corresponding semantic information; when the interaction information is the action information, the interaction information can be recognized based on a body language recognition model, a terminal posture recognition model or a gesture recognition model, and corresponding semantic information is obtained.
Step S230: and searching related video text information according to the semantic information.
After the semantic information corresponding to the interactive information is acquired, the real intention of the user can be known, more accurate search is realized, and the related video text information is searched according to the semantic information. For example, if the video is alatin, then the text message describing the entire video content is the story "alatin and magic lantern".
In one embodiment, semantic information is obtained through semantic understanding of the interactive information, and relevant video text information can be searched on the network according to the semantic information. For example, the interactive information input by the user is "listen to the alading story". Through semantic understanding, the user can know that the user wants to listen to the story of the alatin, and then video text information related to the alatin can be searched, namely the story text of the alatin and the magic lamp.
As an embodiment, a text database may be established in advance, and the text database stores a plurality of labeled video text information, where the labeled content may be a scene, a person, a paragraph, and the like. After the semantic information is acquired, the corresponding video text information can be searched in the database according to the semantic information. It is understood that the labeling of the video text information can be performed according to actual requirements, and is not limited herein.
Step S240: and generating a scene video according to the video text information.
After the video text information is acquired, a corresponding scene video may be generated according to the video text information, which may specifically include the following steps, which may refer to the method flowchart illustrated in fig. 4.
Step S241: and cutting the video text information according to scenes to obtain at least one section of scene text.
Generally, a plurality of scenes are involved in the video text information, and then the video text information can be cut according to the scenes to obtain corresponding scene texts.
As an embodiment, the video text information may be cut by manually labeling the video text information in advance, where the labeled content may be scene information, character information, time information, and the like. The manual labeling can be performed according to actual requirements, and is not limited herein. After the labeling is completed, the labeled video text information can be stored in a database, and the labeled video text information can be obtained by querying the database. The video text information can be cut according to the labeling information in the video text information to obtain one or more segments of scene texts. If the video text information has only one scene, only one section of scene text can be obtained, and if a plurality of scenes are involved, a plurality of sections of scene texts can be obtained.
For example, the obtained tagged video text information includes two scenes, one of which is a street and the other is a room. And cutting the video text information to obtain two sections of scene texts. Further, the position information of the scene text in the video text information can be added to the scene text, so as to determine the occurrence order of the scene.
In an embodiment, the video text information is cut, and the video text information may be input into a first deep learning model for cutting. It can be understood that the first deep learning model can be trained through a large amount of data to realize the cutting of the video text information according to the scene, so as to obtain at least one scene text after the video text information is cut according to the scene.
Step S242: and performing semantic understanding on the at least one section of scene text, and respectively generating a sub-scene video corresponding to each section of scene text.
After the video text information is cut according to scenes, at least one segment of scene text can be acquired. If a section of scene text is obtained after cutting, performing semantic understanding on the section of scene text to generate a sub-scene video corresponding to the section of scene text; and if a plurality of scene texts are obtained, performing semantic understanding on each section of scene text respectively to generate a sub-scene video corresponding to each section of scene text respectively.
Specifically, semantic understanding may be performed on the scene text, and semantic features are extracted from the scene text, where the semantic features include people, places, and events; converting the scene text into voice information; and generating a sub-scene video for executing the event at the place by the person according to the semantic features and the voice information.
Audio information into which audio in the sub-scene video can be converted from scene text; the picture content in the sub-scene video can be acquired according to the information of people, events, places and the like in the semantic features.
As an embodiment, an image database may be established in advance, and a corresponding tag may be added to each image in the image database, so that image information corresponding to a person may be acquired according to the person, an action corresponding to an event may be acquired according to the event, a scene corresponding to the location may be acquired according to the location, and the acquired images may be superimposed and synthesized, so that a screen content of the event executed at the location by the person may be obtained.
In one embodiment, the content of the corresponding screen is searched on the network according to the people, the event and the place, and the screen content is superposed and synthesized to obtain the screen content of the event acquired by the people at the place.
For example, the scene text is "alatin comes to the road junction, because the top level step spans too much from the ground, and does not go up, the magician is requested to pull his hand". And performing semantic understanding on the scene text, and extracting corresponding semantic features, wherein the semantic features comprise character Aladdin and a magic teacher, the place is a tunnel junction, and the event is that the Aladdin requests the magic teacher to pull the magic teacher.
The image of the characters of the avadin and the magic teacher can be obtained, the action of stretching hands to request to pull the magic door can be obtained, and the scene of the road junction can be synthesized and superposed, so that the picture content of the avadin requesting the magic door to pull the magic door at the road junction can be generated. And converting the scene text into voice information, and synthesizing the picture content and the voice information to generate the sub-scene video.
In one embodiment, when converting the scene text into the voice information, if the face information of the user is already obtained, the face information of the user may be recognized, information such as gender and age of a person in the face information may be recognized, and the tone of the voice information may be matched with the person. For example, if the recognized face information is a woman whose age is 10 years, the tone of the voice information can be processed to be sweet, so as to be close to the identity of the user, so that the user can have a better substituted feeling when hearing the voice information.
Step S243: and if a sub-scene video is generated, taking the sub-scene video as the scene video.
If only one section of scene text is obtained after the video text information is cut, generating a sub-scene video corresponding to the scene text, and taking the sub-scene video as the scene video.
Step S244: and if a plurality of sub-scene videos are generated, synthesizing the plurality of sub-scene videos into the scene video.
And if the video text information is cut to obtain a plurality of sections of scene texts, generating a plurality of corresponding sub-scene videos according to each section of scene text. And synthesizing the plurality of sub-scene videos into the scene video according to the occurrence sequence of the video text information.
As an implementation manner, when the sub-scene video is generated, adding location information of a corresponding scene text in the video text information in the sub-scene video, where the location information may be paragraph information of the scene text in the video text information, for example, if a paragraph of the scene text in the video text information is a 12 th paragraph, adding a mark location mark as the 12 th paragraph when the sub-scene video corresponding to the scene text is generated.
It can be understood that when a scene text is labeled manually, corresponding paragraph information is labeled at the same time, and when a corresponding sub-scene video is generated through the scene text, the paragraph information of the scene text can be acquired as a position label and added into the sub-scene video.
And synthesizing the plurality of sub-scene videos into a scene video, wherein the position label in each sub-scene video can be obtained, and the sub-scene videos are spliced and synthesized according to the sequence of the position labels to obtain the scene video. For example, three sub-scene videos are generated, which are a first sub-scene video, a second sub-scene video, and a third sub-scene video, respectively. And if the position in the first sub-scene video is marked as the 1 st segment, the position in the second sub-scene video is marked as the 12 th segment, and the position in the third sub-scene video is marked as the 6 th segment, the occurrence sequence of each sub-scene video can be determined to be the first sub-scene video, the third sub-scene video and the second sub-scene video according to the position marking, and the three sub-scene videos can be spliced according to the sequence to obtain the scene video.
It can be understood that a scene video generated according to the video text information may include a plurality of persons, and one of the persons may be a person to be matched, so as to replace the facial features of the person to be matched.
Step S250: the face information of the user is obtained and corresponding face features are extracted to serve as target face features.
Step S260: and replacing the facial features of the characters to be matched in the scene video with the target facial features to generate a video to be played.
Step S270: and outputting the video to be played.
The steps S250 to S270 refer to corresponding parts of the foregoing embodiments, and are not described herein again.
The embodiment of the application provides a video generation method, which includes the steps of obtaining video text information through interactive information, cutting the video text information according to scenes, and obtaining at least one section of scene text; performing semantic understanding on the at least one section of scene text, and respectively generating a sub-scene video corresponding to each section of scene; if a sub-scene video is generated, taking the sub-scene video as the scene video; and if a plurality of sub-scene videos are generated, synthesizing the plurality of sub-scene videos into the scene video. The video text information can be converted into corresponding scene video so as to show the vivid information content for the user.
Referring to fig. 5, another embodiment of the present application provides a video generation method, where on the basis of the foregoing embodiment, this embodiment mainly describes a process of obtaining a scene video according to interaction information, and the method may include:
step S310: and acquiring the interactive information input by the user.
Step S320: and performing semantic understanding on the interactive information to acquire semantic information of the interactive information.
The steps S310 to S320 can refer to the foregoing embodiments, and are not described herein.
Step S330: and searching related video files according to the semantic information to serve as the scene video.
After the semantic information corresponding to the interactive information is acquired, related video texts can be searched for as the scene video directly according to the semantic information. For example, the interactive information of the user is "how to cook pork in brown sauce", and it can be known through semantic understanding that the user wants to know how to cook pork in brown sauce, a video tutorial related to cooking pork in brown sauce is searched, and the searched video tutorial is used as the scene video.
When searching for a related video course, a plurality of video courses may be acquired, and the video course with the highest play amount or comment amount may be used as the scene video according to the play amount and comment amount of the video. It can be understood that how to select the scene video from the searched video tutorials can be set according to actual requirements, and is not limited herein.
It can be understood that, when searching according to the semantic information, the search may be performed in a special database, or the network search may be performed through the network, and the setting may be performed according to actual requirements, which is not limited herein.
Step S340: the face information of the user is obtained and corresponding face features are extracted to serve as target face features.
Step S350: and replacing the facial features of the characters to be matched in the scene video with the target facial features to generate a video to be played.
Step S360: and outputting the video to be played.
Steps S340 to S360 refer to corresponding parts of the foregoing embodiments, and are not described herein again.
The embodiment of the application provides a video generation method, which comprises the steps of obtaining interactive information input by a user; semantic understanding is carried out on the interaction information, semantic information of the interaction information is obtained, related video files are searched according to the semantic information to serve as the scene video, face information of a user is obtained, and corresponding face features are extracted to serve as target face features; replacing the facial features of the characters to be matched in the scene video with the target facial features to generate a video to be played; and outputting the video to be played. Related videos can be searched through semantic information, so that the information is flexibly displayed to a user in a video mode, the facial features of people to be matched in the videos are replaced, the substitution feeling of the user is enhanced, and the use experience of the user for obtaining the information is improved.
Referring to fig. 6, another embodiment of the present application provides a video generation method, where in this embodiment, on the basis of the foregoing embodiment, a process of determining a person to be matched in a scene video is mainly described, and the method specifically includes:
step S410: and acquiring the interactive information input by the user.
Step S420: and acquiring a scene video according to the interactive information.
Step S430: and determining the character to be matched in the scene video.
The scene information acquired according to the interaction information may include a plurality of characters. Among the multiple persons, one person can be selected as the person to be matched, and the facial features can be replaced.
As an implementation manner, semantic understanding may be performed on an acquired scene video, a hero in the entire scene video is acquired, and the hero is used as a character to be matched to perform subsequent face feature replacement. For example, if the acquired scene video is a video related to alatin, semantic understanding can be performed on the scene video, and if the fact that the hero in the scene video is alatin is known, the alatin can be used as a character to be matched.
Specifically, when the semantic understanding is performed on the scene video, the number of times and duration of each character in the scene video may be counted, and the character with the largest number of times of occurrence is used as the hero of the scene video. For example, in a scene video, the characters appearing include character a, character B and character C, wherein the character a appears 2 times, the time length of the first occurrence is 50s, and the time length of the second occurrence is 10 s; the character B appears once, and the time length is 10 s; the character C appears for 1 time, the appearing time length is 1s, and the character A can be determined as the leading role of the scene video by combining the appearing times and the appearing time lengths of all the characters. Then, the person a can be used as the person to be matched in the scene video.
As an embodiment, the characters appearing in the scene video may be acquired, the characters appearing in the scene video may be displayed to instruct the user to select a designated character from the displayed characters, and the designated character selected by the user may be acquired as the character to be matched in the scene video.
Step S440: the face information of the user is obtained and corresponding face features are extracted to serve as target face features.
Step S450: and replacing the facial features of the characters to be matched in the scene video with the target facial features to generate a video to be played.
Step S460: and outputting the video to be played.
The steps S440 to S460 refer to corresponding parts of the foregoing embodiments, and are not described herein again.
Referring to fig. 7, a video generating apparatus 500 provided in an embodiment of the present application is shown and applied to an electronic device, where the apparatus 500 includes an information input module 510, a scene video acquiring module 520, a face acquiring module 530, a video generating module 540, and an output module 550.
The information input module 510 is configured to obtain interaction information input by a user; a scene video obtaining module 520, configured to obtain a scene video according to the interaction information, where the scene video includes a character to be matched; a face obtaining module 530, configured to obtain face information of a user and extract corresponding face features as target features; the video generating module 540 is configured to replace the facial features of the people to be matched in the scene video with the target facial features to generate a video to be played; and an output module 550, configured to output the video to be played.
The scene video acquiring module 520 further includes: the understanding unit is used for carrying out semantic understanding on the interactive information and acquiring semantic information of the interactive information; the video generating unit is used for searching related video text information according to the semantic information; and generating a scene video according to the video text information.
The video generation unit further includes: the cutting subunit is used for cutting the video text information according to scenes to obtain at least one segment of scene text; the generating subunit is used for performing semantic understanding on the at least one section of scene text and respectively generating a sub-scene video corresponding to each section of scene text; the synthesizing subunit is used for taking one sub-scene video as the scene video if the sub-scene video is generated; and if a plurality of sub-scene videos are generated, synthesizing the plurality of sub-scene videos into the scene video.
The generating subunit is further configured to extract semantic features from the scene text, where the semantic features include people, places, and time; converting the scene text into voice information; and generating a sub-scene video for executing the event at the place by the person according to the semantic features and the voice information.
The scene video obtaining module 520 is further configured to perform semantic understanding on the interaction information to obtain semantic information of the interaction information; and searching related video files according to the semantic information to serve as the scene video.
The video generation module 540 further comprises: the determining unit is used for performing semantic inferiority on the scene video, acquiring a pivot of the whole scene video, and taking the pivot as a character to be matched in the scene video; and the replacing unit is used for replacing the facial features of the person to be matched with the target facial features.
The video generation module 540 further comprises: the display unit is used for displaying all the characters in the scene video so as to instruct a user to select a specified character from all the characters; acquiring an appointed figure selected by a user, and taking the appointed figure as a figure to be matched in the scene video; and the replacing unit is used for replacing the facial features of the person to be matched with the target facial features.
It should be noted that, as will be clear to those skilled in the art, for convenience and brevity of description, the specific working processes of the above-described apparatuses and modules may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In conclusion, the interactive information input by the user is obtained; acquiring a scene video according to the interactive information, wherein the scene video comprises a character to be matched; acquiring face information of a user and extracting corresponding face features as target face features; replacing the facial features of the characters to be matched in the scene video with the target facial features to generate a video to be played; and outputting the video to be played. Therefore, the information is flexibly displayed to the user in a video mode, the facial features of specific people in the video are replaced by the target facial features, the substitution feeling of the user is enhanced, and the use experience of the user for obtaining the information is improved.
In the several embodiments provided in the present application, the coupling or direct coupling or communication connection between the modules shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or modules may be in an electrical, mechanical or other form.
In addition, functional modules in the embodiments of the present application may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.
Referring to fig. 8, a block diagram of an electronic device according to an embodiment of the present disclosure is shown. The electronic device 600 may be a smart phone, a tablet computer, an electronic book, or other electronic devices capable of running an application. The electronic device 600 in the present application may include one or more of the following components: a processor 610, a memory 620, and one or more applications, wherein the one or more applications may be stored in the memory 620 and configured to be executed by the one or more processors 610, the one or more programs configured to perform the methods as described in the aforementioned method embodiments.
The processor 610 may include one or more processing cores. The processor 610 interfaces with various components throughout the electronic device 600 using various interfaces and circuitry to perform various functions of the electronic device 600 and process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 620 and invoking data stored in the memory 620. Alternatively, the processor 610 may be implemented in hardware using at least one of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 610 may integrate one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing display content; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 610, but may be implemented by a communication chip.
The Memory 620 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). The memory 620 may be used to store instructions, programs, code sets, or instruction sets. The memory 620 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing various method embodiments described below, and the like. The data storage area may also store data created during use by the electronic device 600 (e.g., phone books, audio-visual data, chat log data), and so forth.
Referring to fig. 9, a block diagram of a computer-readable storage medium according to an embodiment of the present application is shown. The computer-readable storage medium 700 has stored therein program code that can be called by a processor to execute the methods described in the above-described method embodiments.
The computer-readable storage medium 700 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. Optionally, the computer-readable storage medium 700 includes a non-transitory computer-readable storage medium. The computer readable storage medium 700 has storage space for program code 710 to perform any of the method steps of the method described above. The program code can be read from or written to one or more computer program products. The program code 710 may be compressed, for example, in a suitable form.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not necessarily depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (8)

1. A method of video generation, the method comprising:
acquiring interactive information input by a user;
acquiring a scene video according to the interactive information, wherein the scene video comprises characters to be matched, and the method comprises the following steps:
performing semantic understanding on the interactive information to acquire semantic information of the interactive information;
searching relevant video text information according to the semantic information, wherein the video text information is text information describing the whole video content;
cutting the video text information according to scenes to obtain at least one section of scene text;
performing semantic understanding on the at least one section of scene text, and respectively generating a sub-scene video corresponding to each section of scene text;
if a sub-scene video is generated, taking the sub-scene video as the scene video;
if a plurality of sub-scene videos are generated, synthesizing the sub-scene videos into the scene video, wherein the scene video comprises voice information and video pictures;
acquiring face information of a user and extracting corresponding face features as target face features;
replacing the facial features of the characters to be matched in the scene video with the target facial features to generate a video to be played;
and outputting the video to be played.
2. The method according to claim 1, wherein the semantically understanding the at least one scene text segment to generate a sub-scene video corresponding to each scene text segment respectively comprises:
extracting semantic features from the scene text, wherein the semantic features comprise people, places and events;
converting the scene text into voice information;
and generating a sub-scene video for executing the event at the place by the person according to the semantic features and the voice information.
3. The method of claim 1, wherein the obtaining the scene video according to the interaction information comprises:
performing semantic understanding on the interactive information to acquire semantic information of the interactive information;
and searching related video files according to the semantic information to serve as the scene video.
4. The method according to any one of claims 1 to 3, wherein the replacing the facial features of the person to be matched in the scene video with the target human face features to obtain a video to be played comprises:
performing semantic understanding on the scene video, acquiring a principal role of the whole scene video, and taking the principal role as a character to be matched in the scene video;
and replacing the facial features of the people to be matched with the target facial features.
5. The method according to any one of claims 1 to 3, wherein the replacing the facial features of the person to be matched in the scene video with the target human face features to obtain a video to be played comprises:
displaying all the characters in the scene video to instruct a user to select a designated character from the all the characters;
acquiring an appointed figure selected by a user, and taking the appointed figure as a figure to be matched in the scene video;
and replacing the facial features of the people to be matched with the target facial features.
6. A video generation apparatus, characterized in that the apparatus comprises:
the information input module is used for acquiring interactive information input by a user;
the scene video acquisition module is used for acquiring a scene video according to the interaction information, wherein the scene video comprises a character to be matched;
the scene video acquisition module comprises: the understanding unit is used for carrying out semantic understanding on the interactive information and acquiring semantic information of the interactive information; the video generating unit is used for searching relevant video text information according to the semantic information, wherein the video text information is text information describing the whole video content; the cutting subunit is used for cutting the video text information according to scenes to obtain at least one segment of scene text; the generating subunit is used for performing semantic understanding on the at least one section of scene text and respectively generating a sub-scene video corresponding to each section of scene text; the synthesizing subunit is used for taking one sub-scene video as the scene video if the sub-scene video is generated; if a plurality of sub-scene videos are generated, synthesizing the sub-scene videos into the scene video, wherein the scene video comprises voice information and video pictures;
the face acquisition module is used for acquiring face information of a user and extracting corresponding face features as target face features;
the video generation module is used for replacing the facial features of the characters to be matched in the scene video with the target facial features to generate a video to be played;
and the output module is used for outputting the video to be played.
7. An electronic device, characterized in that the electronic device comprises:
one or more processors;
a memory electrically connected with the one or more processors;
one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to perform the method of any of claims 1-5.
8. A computer-readable storage medium, having stored thereon program code that can be invoked by a processor to perform the method according to any one of claims 1 to 5.
CN201911228480.6A 2019-12-04 2019-12-04 Video generation method and device, electronic equipment and storage medium Active CN110968736B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201911228480.6A CN110968736B (en) 2019-12-04 2019-12-04 Video generation method and device, electronic equipment and storage medium
PCT/CN2020/116452 WO2021109678A1 (en) 2019-12-04 2020-09-21 Video generation method and apparatus, electronic device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911228480.6A CN110968736B (en) 2019-12-04 2019-12-04 Video generation method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110968736A CN110968736A (en) 2020-04-07
CN110968736B true CN110968736B (en) 2021-02-02

Family

ID=70032959

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911228480.6A Active CN110968736B (en) 2019-12-04 2019-12-04 Video generation method and device, electronic equipment and storage medium

Country Status (2)

Country Link
CN (1) CN110968736B (en)
WO (1) WO2021109678A1 (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110968736B (en) * 2019-12-04 2021-02-02 深圳追一科技有限公司 Video generation method and device, electronic equipment and storage medium
CN111831854A (en) * 2020-06-03 2020-10-27 北京百度网讯科技有限公司 Video tag generation method and device, electronic equipment and storage medium
CN112004163A (en) * 2020-08-31 2020-11-27 北京市商汤科技开发有限公司 Video generation method and device, electronic equipment and storage medium
CN112533069A (en) * 2020-11-25 2021-03-19 拉扎斯网络科技(上海)有限公司 Processing method and device for synthesizing multimedia data
CN113709548B (en) * 2021-08-09 2023-08-25 北京达佳互联信息技术有限公司 Image-based multimedia data synthesis method, device, equipment and storage medium
CN113965802A (en) * 2021-10-22 2022-01-21 深圳市兆驰股份有限公司 Immersive video interaction method, device, equipment and storage medium
CN114220051B (en) * 2021-12-10 2023-07-28 马上消费金融股份有限公司 Video processing method, application program testing method and electronic equipment
CN114222077A (en) * 2021-12-14 2022-03-22 惠州视维新技术有限公司 Video processing method and device, storage medium and electronic equipment
CN114445896B (en) * 2022-01-28 2024-04-05 北京百度网讯科技有限公司 Method and device for evaluating confidence of content of person statement in video
CN114827752B (en) * 2022-04-25 2023-07-25 中国平安人寿保险股份有限公司 Video generation method, video generation system, electronic device and storage medium
CN114968523A (en) * 2022-05-24 2022-08-30 北京新唐思创教育科技有限公司 Character transmission method and device among different scenes, electronic equipment and storage medium
CN116389853B (en) * 2023-03-29 2024-02-06 阿里巴巴(中国)有限公司 Video generation method
CN117635784B (en) * 2023-12-19 2024-04-19 世优(北京)科技有限公司 Automatic three-dimensional digital human face animation generation system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101896803A (en) * 2007-12-12 2010-11-24 诺基亚公司 Methods, apparatuses, and computer program products for semantic media conversion from source data to audio/video data
CN102750366A (en) * 2012-06-18 2012-10-24 海信集团有限公司 Video search system and method based on natural interactive import and video search server
WO2016054042A1 (en) * 2014-09-29 2016-04-07 Amazon Technologies, Inc. Virtual world generation engine
CN108111779A (en) * 2017-11-21 2018-06-01 深圳市朗形数字科技有限公司 A kind of method and terminal device of video processing
CN109819313A (en) * 2019-01-10 2019-05-28 腾讯科技(深圳)有限公司 Method for processing video frequency, device and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101807393B (en) * 2010-03-12 2012-12-19 青岛海信电器股份有限公司 KTV system, implement method thereof and TV set
CN105118082B (en) * 2015-07-30 2019-05-28 科大讯飞股份有限公司 Individualized video generation method and system
US10474877B2 (en) * 2015-09-22 2019-11-12 Google Llc Automated effects generation for animated content
CN110286756A (en) * 2019-06-13 2019-09-27 深圳追一科技有限公司 Method for processing video frequency, device, system, terminal device and storage medium
CN110266994B (en) * 2019-06-26 2021-03-26 广东小天才科技有限公司 Video call method, video call device and terminal
CN110968736B (en) * 2019-12-04 2021-02-02 深圳追一科技有限公司 Video generation method and device, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101896803A (en) * 2007-12-12 2010-11-24 诺基亚公司 Methods, apparatuses, and computer program products for semantic media conversion from source data to audio/video data
CN102750366A (en) * 2012-06-18 2012-10-24 海信集团有限公司 Video search system and method based on natural interactive import and video search server
WO2016054042A1 (en) * 2014-09-29 2016-04-07 Amazon Technologies, Inc. Virtual world generation engine
CN108111779A (en) * 2017-11-21 2018-06-01 深圳市朗形数字科技有限公司 A kind of method and terminal device of video processing
CN109819313A (en) * 2019-01-10 2019-05-28 腾讯科技(深圳)有限公司 Method for processing video frequency, device and storage medium

Also Published As

Publication number Publication date
CN110968736A (en) 2020-04-07
WO2021109678A1 (en) 2021-06-10

Similar Documents

Publication Publication Date Title
CN110968736B (en) Video generation method and device, electronic equipment and storage medium
CN109688463B (en) Clip video generation method and device, terminal equipment and storage medium
CN109729426B (en) Method and device for generating video cover image
WO2022001593A1 (en) Video generation method and apparatus, storage medium and computer device
CN110868635B (en) Video processing method and device, electronic equipment and storage medium
CN110557678B (en) Video processing method, device and equipment
JP2021192222A (en) Video image interactive method and apparatus, electronic device, computer readable storage medium, and computer program
CN111241340B (en) Video tag determining method, device, terminal and storage medium
CN111930994A (en) Video editing processing method and device, electronic equipment and storage medium
CN112560605B (en) Interaction method, device, terminal, server and storage medium
CN110781328A (en) Video generation method, system, device and storage medium based on voice recognition
CN110931042A (en) Simultaneous interpretation method and device, electronic equipment and storage medium
EP4300431A1 (en) Action processing method and apparatus for virtual object, and storage medium
CN113132780A (en) Video synthesis method and device, electronic equipment and readable storage medium
CN113806570A (en) Image generation method and generation device, electronic device and storage medium
CN112188267A (en) Video playing method, device and equipment and computer storage medium
CN114513706B (en) Video generation method and device, computer equipment and storage medium
US10120539B2 (en) Method and device for setting user interface
US20150221114A1 (en) Information processing apparatus, information processing method, and program
US10915778B2 (en) User interface framework for multi-selection and operation of non-consecutive segmented information
CN115209233B (en) Video playing method, related device and equipment
JP2008083672A (en) Method of displaying expressional image
CN113438532B (en) Video processing method, video playing method, video processing device, video playing device, electronic equipment and storage medium
JP2019105751A (en) Display control apparatus, program, display system, display control method and display data
CN113673277B (en) Method and device for acquiring online drawing content and intelligent screen equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant