CN114401431A - Virtual human explanation video generation method and related device - Google Patents

Virtual human explanation video generation method and related device Download PDF

Info

Publication number
CN114401431A
CN114401431A CN202210061976.4A CN202210061976A CN114401431A CN 114401431 A CN114401431 A CN 114401431A CN 202210061976 A CN202210061976 A CN 202210061976A CN 114401431 A CN114401431 A CN 114401431A
Authority
CN
China
Prior art keywords
video
virtual human
target document
database
explanation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210061976.4A
Other languages
Chinese (zh)
Other versions
CN114401431B (en
Inventor
涂必超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Life Insurance Company of China Ltd
Original Assignee
Ping An Life Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Life Insurance Company of China Ltd filed Critical Ping An Life Insurance Company of China Ltd
Priority to CN202210061976.4A priority Critical patent/CN114401431B/en
Publication of CN114401431A publication Critical patent/CN114401431A/en
Application granted granted Critical
Publication of CN114401431B publication Critical patent/CN114401431B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • H04N21/4307Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/8106Monomedia components thereof involving special audio data, e.g. different tracks for different languages
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/816Monomedia components thereof involving special video data, e.g 3D video

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The embodiment of the application discloses a virtual human explanation video generation method and a related device, wherein the virtual human explanation video generation method comprises the following steps: receiving question information input by a user; acquiring a target document related to the question information from a database; generating an animation video based on the target document, wherein the animation video comprises voice audio; acquiring a character image and a standard character model from the database, and forming a virtual human based on the character image and the standard character model; and fusing the virtual human into the animation video to form a virtual human explanation video. In the embodiment of the application, the question information initiated by the virtual human explanation video answering user is generated, so that the problem of the user is solved, the virtual human explanation video simulates the situation of real human lecture through the virtual human, and the user can conveniently understand the content meaning to be presented by the virtual human explanation video.

Description

Virtual human explanation video generation method and related device
Technical Field
The invention relates to the technical field of data conversion, in particular to a virtual human explanation video generation method and a related device.
Background
In the prior art, when a user has a question, a search engine is used for searching for a related document, the searched document has a large number of characters, the process that the user reads the characters in the document to understand the meaning of the content in the document is tedious, some users cannot read the characters in the document with great mind, and some users are easy to catch the mind in the process of understanding the document, so that the understanding is not in place.
Disclosure of Invention
The technical problem to be solved by the embodiments of the present invention is to provide a method and a related device for generating a virtual human explanation video, which solves the problem of users by generating question information initiated by a virtual human explanation video answering user.
In a first aspect, an embodiment of the present application provides a method for generating a virtual human explanation video, including:
receiving question information input by a user;
acquiring a target document related to the question information from a database;
generating an animation video based on the target document, wherein the animation video comprises voice audio;
acquiring a character image and a standard character model from the database, and forming a virtual human based on the character image and the standard character model;
and fusing the virtual human into the animation video to form a virtual human explanation video.
In a possible implementation manner, the obtaining, from a database, a target document related to the question information includes:
interpreting the questioning information to obtain the meaning content of the questioning information;
extracting keywords from the meaning content;
and acquiring the target document corresponding to the meaning content from the database based on the keyword.
In a possible implementation manner, the obtaining, from a database, a target document corresponding to the meaning content based on the keyword includes:
inputting the keywords into a database to query to obtain a pre-stored document set associated with the keywords;
and screening out target documents consistent with the meaning content from the pre-stored document set.
In one possible implementation, the generating an animated video based on the target document includes:
acquiring a field of a target document;
generating the voice audio based on the field;
a video template matched with the field from a database;
inserting the field, voice audio, into the video template to form the animated video.
In one possible implementation, the inserting the field, voice audio, into the video template to form the animated video includes:
decoding the video template to obtain a plurality of video frames, wherein the video frames are provided with caption frames capable of being inserted into fields;
aligning the starting time point and the ending time point of the voice audio with the starting video frame and the ending video frame of the video template respectively to determine the corresponding relation between the voice audio and the plurality of video frames;
splitting the field to form a plurality of subfields, and determining correspondence of the plurality of subfields to a plurality of video frames in the video template;
and inserting each subfield into a subtitle frame in the video frame corresponding to the subfield to form the animation video.
In one possible implementation, the forming a virtual human based on the character image and a standard character model includes:
collecting characteristic parameters of the character image and standard character model parameters;
and forming the virtual human on the basis of the characteristic parameters and the standard character model parameters.
In one possible implementation, the voice audio includes a plurality of readings, and the fusing the avatar into the animation video to form an avatar explanation video includes:
determining mouth shapes of the virtual human corresponding to the pronunciations according to the pronunciations of the voice audio;
determining lip movement tracks of the virtual human on the basis of mouth shapes of the virtual human corresponding to the pronunciations;
and synchronizing the motion trail of the lip of the virtual human with the voice audio to form a virtual human explanation video.
In a second aspect, an embodiment of the present application provides a virtual human explanation video generating apparatus, where the virtual human explanation video generating apparatus includes:
the receiving module is used for receiving question information input by a user;
the acquisition module is used for acquiring a target document related to the question information from a database;
the animation video generation module is used for generating an animation video based on the target document, and the animation video comprises voice audio;
the virtual human forming module is used for acquiring the character image and the standard character model from the database and forming a virtual human based on the character image and the standard character model;
and the fusion module is used for fusing the virtual human into the animation video so as to form a virtual human explanation video.
In a third aspect, an embodiment of the present application provides an electronic device, which includes a storage and a processor, where the storage is used to store computer instructions, and the processor is used to call the computer instructions to execute the method described above.
In a fourth aspect, embodiments of the present application provide a computer storage medium storing computer instructions that, when executed by a processor, implement a method as described above.
In the embodiment provided by the application, after receiving question information of a user, the virtual human explanation video generation device queries a target document related to the question information from a database, generates an animation video based on the target document, fuses a generated virtual human into the animation video to generate the virtual human explanation video, and answers question information provided by the user through the virtual human explanation video to decipher the doubt of the user, meanwhile, the user can conveniently understand the content meaning of the target document, and the efficiency of the user in understanding the target document is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flowchart of a method for generating a virtual human explanation video according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a virtual human explanation video generation apparatus according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described below with reference to the drawings in the embodiments of the present application.
The terms "including" and "having," and any variations thereof, in the description and claims of this application and the drawings described above, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
In the prior art, when a user has a question, a search engine is used for searching for a related document, the searched document has a large number of characters, the process that the user reads the characters in the document to understand the meaning of the content in the document is tedious, some users cannot read the characters in the document with great mind, and some users are easy to catch the mind in the process of understanding the document, so that the understanding is not in place.
Referring to fig. 1, the embodiment of the present application discloses a method for generating a virtual human explanation video, which includes, but is not limited to, steps S1-S5.
And S1, receiving the question information input by the user.
The executing body of the method can be the virtual human explanation video generating device 100, and the virtual human explanation video generating device 100 can be an intelligent device such as a computer and a mobile phone.
In the embodiment provided by the present application, the question information may be a question issued by the user to the virtual human explanation video generation apparatus 100 by voice, or a question input by the user at the virtual human explanation video generation apparatus 100 manually.
Correspondingly, the virtual human explanation video generating apparatus 100 may receive a question uttered by a voice from a user and may also receive question information manually input by the user, for example, "what insurance is suitable for purchase with low back and leg pain".
And S2, acquiring the target document related to the question information from the database.
In the embodiment provided by the present application, after receiving the target question information, the virtual human explanation video generation apparatus 100 interprets the question information, identifies the meaning content of the question information, and then queries the data for the target document associated with the question, where the content in the target document is used to answer the question information, and for example, when the question information is "what role to buy insurance", the content in the target document may be "insurance can help individuals or organizations reduce economic hazards, enhance the risk management awareness of the individuals or organizations, and ensure the timely recovery and transfer risks when the individuals or organizations are damaged".
In the embodiment provided by the present application, the database stores a plurality of documents in advance, and when the virtual human explanation video generation apparatus 100 receives the question information, the virtual human explanation video generation apparatus 100 acquires a target document capable of answering the question information from the database.
S3, generating an animation video based on the target document, wherein the animation video comprises voice audio.
In the embodiment provided by the application, the target document may have a field and an image, and when generating the animation video based on the target document, the image in the target document may be used as a video frame, the voice audio may be generated according to the field in the target document, and the voice audio and the video may be combined to form the animation video.
In generating an animated video based on the target document, voice audio may be generated based on fields of the target document, a video template may be selected from a database, and the voice audio may be inserted into the video template to form the animated video.
In the embodiment provided by the present application, the animation video has the voice audio corresponding to the field of the target document, and after the avatar explaining video is generated, the avatar explaining video generating apparatus 100 can play the voice audio to answer the question of the user, which is convenient for the user to understand.
S4, obtaining the character image and the standard character model from the database, and forming the virtual human based on the character image and the standard character model;
in the embodiment provided by the application, the character image and the standard character model are prestored in the data, and the virtual human can be constructed through the character image and the standard character model.
The character image is a two-dimensional image, the standard character module can be a three-dimensional character model, when the virtual human is constructed, the characteristic parameters of the two-dimensional image and the standard character module parameters can be obtained, and the virtual human is generated according to the characteristic parameters of the two-dimensional image and the standard character module parameters.
S5, fusing the virtual human into the animation video to form a virtual human explanation video.
In the embodiment provided by the application, the virtual human is inserted into the animation video, and when the formed virtual human explains that the video is played, the virtual human is arranged in the video picture.
Specifically, after the virtual human is inserted into the animation video, when the animation video is played, the action expression of the virtual human can be driven to be synchronous with the played voice so as to simulate the situation that the virtual human speaks.
In the embodiment provided by the application, after receiving the question information of the user, the virtual human explanation video generation device 100 queries the target document related to the question information from the database, generates an animation video based on the target document, fuses the generated virtual human into the animation video to generate the virtual human explanation video, and answers the question information provided by the user through the virtual human explanation video to decipher the doubt of the user, meanwhile, the user can conveniently understand the content meaning of the target document, and the efficiency of the user in understanding the target document is improved.
The obtaining of the target document related to the question information from the database includes:
interpreting the questioning information to obtain the meaning content of the questioning information;
extracting keywords from the meaning content;
and acquiring the target document corresponding to the meaning content from the database based on the keyword.
In the embodiment provided by the present application, the question information input by the user may be verbally expressed by the user, and the avatar explains that the video generation apparatus 100 needs to understand the question information, and extracts the keyword after newly sorting the question.
For example, when the question information output by the user is "what insurance is suitable for purchase with low back and leg pain", the virtual human explanation video generation apparatus 100 may extract keywords such as "low back and leg pain" and "insurance", and obtain a target document whose meaning content matches the question information from a database by using the keywords.
The acquiring of the target document corresponding to the meaning content from the database based on the keyword comprises:
inputting the key words into a database to be inquired to obtain a pre-stored document set associated with the key words;
and screening out target documents consistent with the meaning content from the pre-stored document set.
In this embodiment of the application, various pre-stored documents are pre-stored in the database, when the virtual human explanation video generation device 100 can query the pre-stored documents about "low back pain" and "insurance" in the database through "low back pain" and "insurance", the number of the queried pre-stored documents may be large, and when a target document is specifically obtained, the target document most consistent with the question information is screened out from the pre-stored documents.
In the embodiment provided by the application, when the target document which is most consistent with the question information is screened out from the pre-stored documents, the pre-stored documents which are associated with the target document can be inquired according to the keywords, the number of the keywords in each pre-stored document is calculated, the pre-stored documents are ranked, and the pre-stored document which contains the most keywords is determined as the target document.
In the embodiment provided by the application, when an animation video is generated based on the target document, a field of the target document is obtained, the field may include a plurality of characters, the language of the field may be chinese, english, and the like, and the field may also include a plurality of characters with mixed languages.
In the embodiment provided by the application, the field of the target document is converted into a voice audio, specifically, the pronunciation of each character in the field is obtained by reading the meaning content of the field in the target question, and the pronunciations of each character in the field are connected in series to form the voice audio.
In the embodiment provided by the application, taking the type of the field as the chinese as an example, the characters in the field may have polyphones, and specifically, when the pronunciation audio is generated, the judgment is performed according to the overall meaning of the sub-field where the polyphone is located, and the pronunciation of each character in the sub-field can be determined by identifying the content meaning of the sub-field.
When the field includes a plurality of words of mixed languages, taking the case that the field includes chinese characters and english words, when the speech audio is generated based on the field, the chinese characters pronounce in chinese and the english words pronounce in english.
In the embodiment provided by the application, a video template can be obtained from a database, and the duration of the video template is greater than or equal to the duration of the voice audio.
In the embodiment provided by the present application, when the duration of the video template is greater than the duration of the voice audio, the video template may be clipped, so that the duration of the voice audio is the same as the duration of the video template.
In the embodiment provided by the present application, a field of the target document may be inserted into a video template to serve as a subtitle of the video template, and then the voice audio is inserted into the video template to form the animation video, wherein when the animation video is played, a presentation process of the field serving as the subtitle is synchronized with a playing process of the voice audio.
When the field is inserted into a video template to be used as a subtitle of the video template, the field can be split into a plurality of sub-fields, and the plurality of sub-fields can be displayed one by one when the animation video is played.
In the embodiment provided by the present application, when the animation video is formed, the video template may be decoded to obtain a plurality of video frames, where the video frames have a subtitle box with an insertable field.
In the embodiment provided by the present application, the video frame has an image frame into which an image can be inserted, and when the animation video is formed, the avatar interpretation video generation apparatus 100 can analyze a field of a target document to identify a content meaning of the target document, query an image related to the content meaning from a database based on the content meaning of the target document, and insert the image into the image frame.
Specifically, the field has a plurality of subfields, each subfield corresponds to at least one video frame, when the image is inserted into the image frame, the content meaning of each subfield is specifically analyzed, the image is inquired from a database according to the content meaning, when the subfield corresponds to a plurality of video frames, the avatar teaching video generating device 100 can inquire a plurality of consecutive images related to the subfield content meaning from the database and insert the images into the image frames of the plurality of video frames respectively, wherein the number of the consecutive images inquired from the database can be the same as or different from the number of the video frames corresponding to the subfield; when the number of consecutive images is greater than the number of video frames corresponding to the subfield, a plurality of images may be simultaneously inserted into an image frame of one video frame, and when the number of consecutive images is less than the number of video frames corresponding to the subfield, no image may be inserted into a corresponding portion of the video frames in the subfield.
In a possible implementation manner, when the animation video is formed, the background of each video frame in the video template may be removed, and then an image obtained by querying from the database is inserted into an image frame of each video frame, and the image is used as the background of the video frame.
In an embodiment provided by the present application, the voice audio has a start time point and an end time point, the video template has a start time point and an end time, and when the voice audio is inserted into the animation video, the start time point and the end time point of the voice audio are respectively aligned with the start video frame and the end video frame of the video template, and at this time, the correspondence between the voice audio and the plurality of video frames can be determined.
In the embodiments provided in the present application, the size of the subtitle box of each video frame may be predetermined, the size of the text in the sub-field may be manually set, and when the predetermined subtitle box is inserted into the sub-field, the number of characters of the sub-field that can be inserted into the subtitle box is limited, for example, when the number of characters of the sub-field is 20, and the number of characters of the subtitle box of the video frame corresponding to the sub-field is limited to 15, the sub-field may be split into two sub-fields, the number of both sub-fields may need to be less than 15, specifically, the sub-field with the number of characters of 20 may be split into two sub-fields with the number of characters of 10, and the sub-field with the number of characters of 20 may be split into a first sub-field and a second sub-field, the number of characters of the first sub-field is 15, the number of the second sub-field is 5, when the sub-field is split into two sub-fields, the original meaning of the sub-field is not changed.
Specifically, when a field in the target document is split into a plurality of subfields, the audio is correspondingly split into a plurality of sub-audio, and when a start time point and an end time point of the audio are respectively aligned with a start video frame and an end video frame of the video template, a video frame corresponding to each subfield can be determined, and a video frame corresponding to each sub-audio can be determined at the same time.
After determining the corresponding relation between each subfield and the video frame in the video template, inserting each subfield into the caption frame in the video frame corresponding to each subfield.
In the embodiment of the present application, when forming the virtual person, the virtual person may be formed by acquiring the characteristic parameters of the character image and the standard character model parameters and then forming the virtual person based on the characteristic parameters of the character image and the standard character model parameters.
In the embodiment provided by the application, the voice audio comprises a plurality of pronunciations, and when the virtual human is fused to the animation video, the motion trail of the virtual human is synchronized with the voice audio so as to simulate the situation that the virtual human speaks.
Specifically, when the virtual human simulates a speaking situation, the lips of the virtual human need to have a movement trajectory, and in the embodiment provided by the application, the mouth shape of the virtual human corresponding to each reading is determined according to each reading of the voice audio.
Determining the lip movement locus of the virtual human on the basis of the mouth shapes of the virtual human corresponding to the pronunciations, so that the movement locus of the virtual human is synchronous with the voice audio.
Referring to fig. 2, an embodiment of the present application further provides a virtual human explanation video generating apparatus 100, where the virtual human explanation video generating apparatus 100 includes:
a receiving module 110, configured to receive question information input by a user;
an obtaining module 120, configured to obtain a target document related to the question information from a database;
an animation video generation module 130, configured to generate an animation video based on the target document, where the animation video includes a voice audio;
a virtual human forming module 140 for acquiring the character image and the standard character model from the database, and forming a virtual human based on the character image and the standard character model;
and the fusion module 150 is used for fusing the virtual human into the animation video to form a virtual human explanation video.
For the concepts, explanations, details and other steps related to the technical solutions provided in the embodiments of the present application, please refer to the description of the method or the contents of the method steps executed by the apparatus in other embodiments, which are not described herein again.
Referring to fig. 3, an electronic device provided in the embodiments of the present application may include a processor 210, a storage 220, and a communication interface 230. The processor 210, the storage 220, and the communication interface 230 are connected by a bus 240, the storage 220 for storing instructions, the processor 210 for executing instructions stored by the storage 220.
The processor 210 is used to execute the instructions stored in the storage 220 to control the communication interface 230 to receive and transmit signals, and to complete the steps of the above-mentioned method. The storage 220 may be integrated in the processor 210, or may be provided separately from the processor 210.
In one possible implementation, the function of the communication interface 230 may be implemented by a transceiver circuit or a dedicated chip for transceiving. Processor 210 may be considered to be implemented by a dedicated processing chip, processing circuit, processor, or a general-purpose chip.
Embodiments of the present application also provide a computer storage medium, which stores computer instructions, and when the computer instructions are executed by a processor, the method described above is implemented.
In another possible implementation manner, the apparatus provided by the embodiment of the present application may be implemented by using a general-purpose computer. Program code that implements the functions of the processor 210 and the communication interface 230 is stored in the storage 220, and a general-purpose processor implements the functions of the processor 210 and the communication interface 230 by executing the code in the storage 220.
For the concepts, explanations, details and other steps related to the technical solutions provided in the embodiments of the present application, please refer to the description of the method or the contents of the method steps executed by the apparatus in other embodiments, which are not described herein again.
As another implementation of the present embodiment, a computer program product is provided that contains instructions that, when executed, perform the method in the above-described method embodiments.
Those skilled in the art will appreciate that in an actual terminal or server, there may be multiple processors and storage. The storage may also be referred to as a storage medium or a storage device, and the like, which is not limited in this application.
It should be understood that, in the embodiment of the present Application, the processor may be a Central Processing Unit (CPU), and the processor may also be other general-purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field-Programmable Gate arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like.
It should also be understood that references to a memory in embodiments of the present application may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. Volatile storage may be Random Access Memory (RAM) which acts as external cache Memory. By way of example and not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), Synchlink DRAM (SLDRAM), and Direct Rambus RAM (DRRAM).
It should be noted that when the processor is a general-purpose processor, a DSP, an ASIC, an FPGA or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component, the memory (memory module) is integrated in the processor.
It should be noted that the reservoirs described herein are intended to include, but are not limited to, these and any other suitable types of reservoirs.
The bus may include a power bus, a control bus, a status signal bus, and the like, in addition to the data bus. But for clarity of illustration the various buses are labeled as buses in the figures.
It should also be understood that reference herein to first, second, third, fourth, and various numerical designations is made only for ease of description and should not be used to limit the scope of the present application.
It should be understood that the term "and/or" herein is merely one type of association relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The steps of a method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in a processor. The software module may be located in a random access memory, a flash memory, a read-only memory, a programmable read-only memory or an electrically erasable programmable memory, a register, etc. storage media that are well known in the art. The storage medium is located in a storage, and the processor reads information in the storage and completes the steps of the method in combination with hardware of the processor. To avoid repetition, it is not described in detail here.
In the embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
Those of ordinary skill in the art will appreciate that the various Illustrative Logical Blocks (ILBs) and steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and in actual implementation, there may be other divisions, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.
The modules described as separate components may or may not be physically separate. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present application may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, digital subscriber line) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk), among others.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A virtual human explanation video generation method is characterized by comprising the following steps:
receiving question information input by a user;
acquiring a target document related to the question information from a database;
generating an animation video based on the target document, wherein the animation video comprises voice audio;
acquiring a character image and a standard character model from the database, and forming a virtual human based on the character image and the standard character model;
and fusing the virtual human into the animation video to form a virtual human explanation video.
2. A virtual human explanation video generation method as claimed in claim 1, wherein said obtaining a target document related to said question information from a database comprises:
interpreting the questioning information to obtain the meaning content of the questioning information;
extracting keywords from the meaning content;
and acquiring the target document corresponding to the meaning content from the database based on the keyword.
3. The virtual human explanation video generation method as claimed in claim 2, wherein the obtaining of the target document corresponding to the meaning content from the database based on the keyword comprises:
inputting the keywords into a database to query to obtain a pre-stored document set associated with the keywords;
and screening out target documents consistent with the meaning content from the pre-stored document set.
4. A method for generating a avatar interpretation video according to any of claims 1-3, wherein said generating an animation video based on said target document comprises:
acquiring a field of a target document;
generating the voice audio based on the field;
acquiring a video template matched with the field from a database;
inserting the field, voice audio, into the video template to form the animated video.
5. A method for generating a virtual human interpretive video according to claim 4, wherein said inserting said fields, voice audio into said video template to form said animated video comprises:
decoding the video template to obtain a plurality of video frames, wherein the video frames are provided with caption frames capable of being inserted into fields;
aligning the starting time point and the ending time point of the voice audio with the starting video frame and the ending video frame of the video template respectively to determine the corresponding relation between the voice audio and the plurality of video frames;
splitting the field to form a plurality of subfields, and determining correspondence of the plurality of subfields to a plurality of video frames in the video template;
and inserting each subfield into a subtitle frame in the video frame corresponding to the subfield to form the animation video.
6. A method for creating an explanation video of a virtual human being as claimed in claim 4, wherein said forming a virtual human being based on said image of a character and a standard character model comprises:
collecting characteristic parameters of the character image and standard character model parameters;
and forming the virtual human on the basis of the characteristic parameters and the standard character model parameters.
7. The method for generating a human avatar explanation video according to claim 6, wherein the voice audio includes a plurality of reading voices, and the fusing the human avatar into the animation video to form a human avatar explanation video includes:
determining mouth shapes of the virtual human corresponding to the pronunciations according to the pronunciations of the voice audio;
determining lip movement tracks of the virtual human on the basis of mouth shapes of the virtual human corresponding to the pronunciations;
and synchronizing the motion trail of the lip of the virtual human with the voice audio to form a virtual human explanation video.
8. A virtual human explanation video generation apparatus, characterized by comprising:
the receiving module is used for receiving question information input by a user;
the acquisition module is used for acquiring a target document related to the question information from a database;
the animation video generation module is used for generating an animation video based on the target document, and the animation video comprises voice audio;
the virtual human forming module is used for acquiring the character image and the standard character model from the database and forming a virtual human based on the character image and the standard character model;
and the fusion module is used for fusing the virtual human into the animation video so as to form a virtual human explanation video.
9. An electronic device, comprising storage to store computer instructions and a processor to invoke the computer instructions to perform the method of any of claims 1-7.
10. A computer storage medium storing computer instructions which, when executed by a processor, implement the method of any one of claims 1 to 7.
CN202210061976.4A 2022-01-19 2022-01-19 Virtual person explanation video generation method and related device Active CN114401431B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210061976.4A CN114401431B (en) 2022-01-19 2022-01-19 Virtual person explanation video generation method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210061976.4A CN114401431B (en) 2022-01-19 2022-01-19 Virtual person explanation video generation method and related device

Publications (2)

Publication Number Publication Date
CN114401431A true CN114401431A (en) 2022-04-26
CN114401431B CN114401431B (en) 2024-04-09

Family

ID=81231643

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210061976.4A Active CN114401431B (en) 2022-01-19 2022-01-19 Virtual person explanation video generation method and related device

Country Status (1)

Country Link
CN (1) CN114401431B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115515002A (en) * 2022-09-22 2022-12-23 深圳市木愚科技有限公司 Intelligent admire class generation method and device based on virtual digital person and storage medium
CN115767202A (en) * 2022-11-10 2023-03-07 兴业银行股份有限公司 Lip language synchronous optimization method and system for virtual character video generation
CN115761114A (en) * 2022-10-28 2023-03-07 如你所视(北京)科技有限公司 Video generation method and device and computer readable storage medium
CN116520982A (en) * 2023-04-18 2023-08-01 广州市宇境科技有限公司 Virtual character switching method and system based on multi-mode data
CN117221465A (en) * 2023-09-20 2023-12-12 北京约来健康科技有限公司 Digital video content synthesis method and system

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103258340A (en) * 2013-04-17 2013-08-21 中国科学技术大学 Pronunciation method of three-dimensional visual Chinese mandarin pronunciation dictionary with pronunciation being rich in emotion expression ability
CN104731959A (en) * 2015-04-03 2015-06-24 北京威扬科技有限公司 Video abstraction generating method, device and system based on text webpage content
CN109377539A (en) * 2018-11-06 2019-02-22 北京百度网讯科技有限公司 Method and apparatus for generating animation
CN110381266A (en) * 2019-07-31 2019-10-25 百度在线网络技术(北京)有限公司 A kind of video generation method, device and terminal
JP2020005309A (en) * 2019-09-19 2020-01-09 株式会社オープンエイト Moving image editing server and program
CN110866968A (en) * 2019-10-18 2020-03-06 平安科技(深圳)有限公司 Method for generating virtual character video based on neural network and related equipment
CN110876024A (en) * 2018-08-31 2020-03-10 百度在线网络技术(北京)有限公司 Method and device for determining lip action of avatar
JP2020065307A (en) * 2020-01-31 2020-04-23 株式会社オープンエイト Server, program, and moving image distribution system
JP2020096373A (en) * 2020-03-05 2020-06-18 株式会社オープンエイト Server, program, and video distribution system
CN112328742A (en) * 2020-11-03 2021-02-05 平安科技(深圳)有限公司 Training method and device based on artificial intelligence, computer equipment and storage medium
CN112785667A (en) * 2021-01-25 2021-05-11 北京有竹居网络技术有限公司 Video generation method, device, medium and electronic equipment
CN113160366A (en) * 2021-03-22 2021-07-23 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) 3D face animation synthesis method and system
CN113781610A (en) * 2021-06-28 2021-12-10 武汉大学 Virtual face generation method

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103258340A (en) * 2013-04-17 2013-08-21 中国科学技术大学 Pronunciation method of three-dimensional visual Chinese mandarin pronunciation dictionary with pronunciation being rich in emotion expression ability
CN104731959A (en) * 2015-04-03 2015-06-24 北京威扬科技有限公司 Video abstraction generating method, device and system based on text webpage content
CN110876024A (en) * 2018-08-31 2020-03-10 百度在线网络技术(北京)有限公司 Method and device for determining lip action of avatar
CN109377539A (en) * 2018-11-06 2019-02-22 北京百度网讯科技有限公司 Method and apparatus for generating animation
CN110381266A (en) * 2019-07-31 2019-10-25 百度在线网络技术(北京)有限公司 A kind of video generation method, device and terminal
JP2020005309A (en) * 2019-09-19 2020-01-09 株式会社オープンエイト Moving image editing server and program
CN110866968A (en) * 2019-10-18 2020-03-06 平安科技(深圳)有限公司 Method for generating virtual character video based on neural network and related equipment
JP2020065307A (en) * 2020-01-31 2020-04-23 株式会社オープンエイト Server, program, and moving image distribution system
JP2020096373A (en) * 2020-03-05 2020-06-18 株式会社オープンエイト Server, program, and video distribution system
CN112328742A (en) * 2020-11-03 2021-02-05 平安科技(深圳)有限公司 Training method and device based on artificial intelligence, computer equipment and storage medium
CN112785667A (en) * 2021-01-25 2021-05-11 北京有竹居网络技术有限公司 Video generation method, device, medium and electronic equipment
CN113160366A (en) * 2021-03-22 2021-07-23 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) 3D face animation synthesis method and system
CN113781610A (en) * 2021-06-28 2021-12-10 武汉大学 Virtual face generation method

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115515002A (en) * 2022-09-22 2022-12-23 深圳市木愚科技有限公司 Intelligent admire class generation method and device based on virtual digital person and storage medium
CN115761114A (en) * 2022-10-28 2023-03-07 如你所视(北京)科技有限公司 Video generation method and device and computer readable storage medium
CN115761114B (en) * 2022-10-28 2024-04-30 如你所视(北京)科技有限公司 Video generation method, device and computer readable storage medium
CN115767202A (en) * 2022-11-10 2023-03-07 兴业银行股份有限公司 Lip language synchronous optimization method and system for virtual character video generation
CN116520982A (en) * 2023-04-18 2023-08-01 广州市宇境科技有限公司 Virtual character switching method and system based on multi-mode data
CN116520982B (en) * 2023-04-18 2023-12-15 云南骏宇国际文化博览股份有限公司 Virtual character switching method and system based on multi-mode data
CN117221465A (en) * 2023-09-20 2023-12-12 北京约来健康科技有限公司 Digital video content synthesis method and system
CN117221465B (en) * 2023-09-20 2024-04-16 北京约来健康科技有限公司 Digital video content synthesis method and system

Also Published As

Publication number Publication date
CN114401431B (en) 2024-04-09

Similar Documents

Publication Publication Date Title
CN110647636B (en) Interaction method, interaction device, terminal equipment and storage medium
CN114401431A (en) Virtual human explanation video generation method and related device
CN110517689B (en) Voice data processing method, device and storage medium
US10192544B2 (en) Method and system for constructing a language model
CN109461437B (en) Verification content generation method and related device for lip language identification
CN109979450B (en) Information processing method and device and electronic equipment
CN110602516A (en) Information interaction method and device based on live video and electronic equipment
CN110910903B (en) Speech emotion recognition method, device, equipment and computer readable storage medium
CN109545183A (en) Text handling method, device, electronic equipment and storage medium
CN109256133A (en) A kind of voice interactive method, device, equipment and storage medium
CN114390220B (en) Animation video generation method and related device
CN116821290A (en) Multitasking dialogue-oriented large language model training method and interaction method
CN117453871A (en) Interaction method, device, computer equipment and storage medium
CN109065019B (en) Intelligent robot-oriented story data processing method and system
CN114064943A (en) Conference management method, conference management device, storage medium and electronic equipment
CN111160051B (en) Data processing method, device, electronic equipment and storage medium
CN113542797A (en) Interaction method and device in video playing and computer readable storage medium
CN111914115B (en) Sound information processing method and device and electronic equipment
CN109241331B (en) Intelligent robot-oriented story data processing method
CN116895087A (en) Face five sense organs screening method and device and face five sense organs screening system
CN111523343A (en) Reading interaction method, device, equipment, server and storage medium
CN114267324A (en) Voice generation method, device, equipment and storage medium
CN114037946A (en) Video classification method and device, electronic equipment and medium
CN114595314A (en) Emotion-fused conversation response method, emotion-fused conversation response device, terminal and storage device
CN115408500A (en) Question-answer consistency evaluation method and device, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant