CN114401431B

CN114401431B - Virtual person explanation video generation method and related device

Info

Publication number: CN114401431B
Application number: CN202210061976.4A
Authority: CN
Inventors: 涂必超
Original assignee: Ping An Life Insurance Company of China Ltd
Current assignee: Ping An Life Insurance Company of China Ltd
Priority date: 2022-01-19
Filing date: 2022-01-19
Publication date: 2024-04-09
Anticipated expiration: 2042-01-19
Also published as: CN114401431A

Abstract

The embodiment of the application discloses a virtual person explanation video generation method and a related device, wherein the virtual person explanation video generation method comprises the following steps: receiving questioning information input by a user; acquiring a target document related to the questioning information from a database; generating an animation video based on the target document, wherein the animation video comprises voice audio; acquiring a character image and a standard character model from the database, and forming a virtual person based on the character image and the standard character model; and fusing the virtual person into the animation video to form a virtual person explanation video. According to the method and the device for the virtual person explanation video, the question information initiated by the user is solved through generating the virtual person explanation video, so that the problem of the user is solved, the virtual person explanation video simulates the situation of a true person lecture through the virtual person, and the user can conveniently understand the meaning of the content to be presented by the virtual person explanation video.

Description

Virtual person explanation video generation method and related device

Technical Field

The invention relates to the technical field of data conversion, in particular to a virtual person explanation video generation method and a related device.

Background

In the prior art, when a user is in doubt, the related documents are searched through a search engine, a large amount of words are contained in the searched documents, the process that the user reads the words in the documents to understand the meaning of the contents in the documents is quite boring, some users cannot read the words in the documents in a static way, some users are easy to tranquilize in the process of understanding the documents, and cannot understand the documents in place, in conclusion, when the user is in doubt, the related documents are searched to obtain answers, the documents have larger limitation, the meaning content of the documents is not lively through the words, and the efficiency of the user in understanding the meaning content of the documents is quite low.

Disclosure of Invention

The technical problem to be solved by the embodiment of the invention is to provide a virtual person explanation video generation method and a related device, which solve the problem of users by generating a virtual person explanation video to solve the question information initiated by the users, and the virtual person explanation video simulates the situation of a true person lecture through a virtual person, so that the user can conveniently understand the meaning of the content to be presented by the virtual person explanation video.

In a first aspect, an embodiment of the present application provides a method for generating a virtual person explanation video, including:

receiving questioning information input by a user;

acquiring a target document related to the questioning information from a database;

generating an animation video based on the target document, wherein the animation video comprises voice audio;

acquiring a character image and a standard character model from the database, and forming a virtual person based on the character image and the standard character model;

and fusing the virtual person into the animation video to form a virtual person explanation video.

In one possible implementation manner, the obtaining, from a database, the target document related to the question information includes:

reading the questioning information to obtain the meaning content of the questioning information;

extracting keywords from the meaning content;

and acquiring the target document conforming to the meaning content from the database based on the keywords.

In one possible implementation manner, the obtaining, based on the keyword, a target document matching the meaning content from a database includes:

inputting the keywords into a database to inquire a pre-stored document set associated with the keywords;

and screening target documents consistent with the meaning content from the pre-stored document set.

In one possible implementation, the generating an animated video based on the target document includes:

acquiring a field of a target document;

generating the voice audio based on the field;

video templates matching the fields from the database;

the fields, voice audio, are inserted into the video template to form the animated video.

In one possible implementation, the inserting the fields, voice audio, into the video template to form the animated video includes:

decoding the video template to obtain a plurality of video frames, wherein the video frames are provided with subtitle frames capable of being inserted with fields;

aligning the starting time point and the ending time point of the voice audio with the starting video frame and the ending video frame of the video template respectively so as to determine the corresponding relation between the voice audio and a plurality of video frames;

splitting the fields to form a plurality of subfields, and determining the corresponding relation between the subfields and a plurality of video frames in the video template;

and inserting each sub-field into a subtitle frame in the corresponding video frame respectively to form the animation video.

In one possible implementation, the forming a virtual person based on the character image and a standard character model includes:

collecting characteristic parameters and standard character model parameters of the character image;

and forming the virtual person based on the characteristic parameters and the standard character model parameters.

In one possible implementation, the voice audio includes a plurality of readings, and the fusing the virtual person into the animated video to form a virtual person interpretation video includes:

determining the mouth shape of the virtual person corresponding to each pronunciation according to each pronunciation of the voice audio;

determining the lip movement track of the virtual person based on the mouth shape of the virtual person corresponding to each pronunciation;

and synchronizing the motion trail of the lip of the virtual person with the voice audio to form a virtual person explanation video.

In a second aspect, an embodiment of the present application provides a virtual person interpretation video generating apparatus, including:

the receiving module is used for receiving the questioning information input by the user;

the acquisition module is used for acquiring target documents related to the questioning information from a database;

the animation video generation module is used for generating animation videos based on the target document, wherein the animation videos comprise voice audios;

the virtual person forming module is used for acquiring a character image and a standard character model from the database and forming a virtual person based on the character image and the standard character model;

and the fusion module is used for fusing the virtual person into the animation video to form a virtual person explanation video.

In a third aspect, embodiments of the present application provide an electronic device comprising a memory for storing computer instructions and a processor for invoking the computer instructions to perform the method as described above.

In a fourth aspect, embodiments of the present application provide a computer storage medium storing computer instructions that when executed by a processor implement a method as described above.

In the embodiment provided by the application, after receiving the question information of the user, the virtual person explanation video generating device queries the target document related to the question information from the database, generates the animation video based on the target document, fuses the generated virtual person into the animation video to generate the virtual person explanation video, answers the question information presented by the user through the virtual person explanation video to read the doubt of the user, and simultaneously facilitates the user to understand the content meaning of the target document and improves the efficiency of the user to understand the target document.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of a method for generating virtual person explanation video according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a virtual person explanation video generating device according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Embodiments of the present application are described below with reference to the accompanying drawings in the embodiments of the present application.

The terms "comprising" and "having" and any variations thereof in the description and claims of the present application and in the foregoing drawings are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

Referring to fig. 1, an embodiment of the application discloses a method for generating a virtual person explanation video, which includes, but is not limited to, steps S1-S5.

S1, receiving question information input by a user.

The execution subject of the method can explain the video generation device 100 by a virtual person, and the virtual person can explain the video generation device 100 by intelligent equipment such as a computer, a mobile phone and the like.

In the embodiment provided in the present application, the question information may be a question sent by the user to the virtual person explanation video generating apparatus 100 through voice, or a question input by the user manually to the virtual person explanation video generating apparatus 100.

Correspondingly, the virtual person explanation video generating apparatus 100 may receive a question sent by a user through voice, and may also receive question information manually input by the user, for example, the question information may be "what insurance is suitable for purchase with lumbago and skelalgia".

S2, acquiring target documents related to the questioning information from a database.

In the embodiment provided in this application, after receiving the target question information, the virtual person explanation video generating apparatus 100 interprets the question information, identifies the meaning content of the question information, and then queries the data for the target document associated with the question, where the content in the target document is used for answering the question information, and when the question information is "what is done by buying insurance", for example, the content of the target document may be "insurance can help individuals or institutions reduce economic hazards, enhance their risk management awareness, and ensure timely recovery and risk transfer when damaged".

In the embodiment provided in the application, the database stores a plurality of documents at an early stage, and when the virtual person explanation video generating apparatus 100 receives the question information, the virtual person explanation video generating apparatus 100 obtains a target document capable of solving the question information from the database.

S3, generating an animation video based on the target document, wherein the animation video comprises voice audio.

In the embodiment provided by the application, the target document can be provided with fields and images, when the animation video is generated based on the target document, the images in the target document can be used as video frames, voice audio is generated according to the fields in the target document, and the voice audio and the video are combined to form the animation video.

In generating an animated video based on the target document, voice audio may be generated based on fields of the target document, a video template may be selected from a database, and the voice audio may be inserted into the video template to form the animated video.

In the embodiment provided in the application, the animation video has voice audio, the voice audio corresponds to a field of the target document, and after the virtual person explanation video is generated, the virtual person explanation video generating device 100 can play the voice audio to answer a question of a user, so that the user can understand the question conveniently.

S4, acquiring a character image and a standard character model from a database, and forming a virtual person based on the character image and the standard character model;

in the embodiment provided by the application, the data are pre-stored with a character image and a standard character model, and a virtual person can be constructed through the character image and the standard character model.

The character image is a two-dimensional image, the standard character module can be a three-dimensional character model, and when a virtual person is constructed, the characteristic parameters of the two-dimensional image and the standard character module parameters can be obtained, and the virtual person is generated according to the characteristic parameters of the two-dimensional image and the standard character module parameters.

And S5, fusing the virtual person into the animation video to form a virtual person explanation video.

In the embodiment provided by the application, the virtual person is inserted into the animation video, and when the formed virtual person explains that the video is played, the virtual person is arranged in a video picture.

Specifically, after the dummy is inserted into the animation video, when the animation video is played, the motion expression of the dummy can be driven to be synchronous with the played voice so as to simulate the situation that the dummy is speaking.

In the embodiment provided by the application, the virtual person explanation video generating device 100 queries the target document related to the question information from the database after receiving the question information of the user, generates the animation video based on the target document, fuses the generated virtual person into the animation video to generate the virtual person explanation video, answers the question information presented by the user through the virtual person explanation video to read the confusion of the user, and simultaneously facilitates the user to understand the content meaning of the target document and improves the efficiency of the user to understand the target document.

The obtaining the target document related to the questioning information from the database comprises the following steps:

extracting keywords from the meaning content;

In the embodiment provided in the application, the question information input by the user may be spoken by the user, and the virtual person explanation video generating device 100 needs to understand the question information, and extracts the keywords after the questions are newly arranged.

For example, when the question information output by the user is "what insurance is suitable for purchase with lumbago and skelalgia", the virtual person explanation video generating apparatus 100 may extract keywords such as "lumbago and skelalgia", "insurance", and the like, and obtain a target document whose meaning content matches the question information from the database through the keywords.

The obtaining the target document conforming to the meaning content from the database based on the keywords comprises the following steps:

inputting the keywords into a database, and inquiring to obtain a pre-stored document set associated with the keywords;

In this embodiment of the present application, various pre-stored documents are pre-stored in a database, and when the virtual person explains the video generating apparatus 100, through "lumbago and skelalgia" and "insurance", the pre-stored documents related to "lumbago and skelalgia" and "insurance" can be queried in the database, the number of queried pre-stored documents may be relatively large, and when the target document is specifically obtained, the target document most conforming to the questioning information is selected from the pre-stored documents.

In the embodiment provided by the application, when the target document most conforming to the questioning information is screened from the pre-stored documents, the pre-stored documents associated with the target document can be queried according to the keywords, the number of the keywords in each pre-stored document is calculated, the pre-stored documents are ranked, and the pre-stored document with the largest number of the keywords is determined as the target document.

In the embodiment provided by the application, when the animation video is generated based on the target document, the field of the target document is acquired, the field can comprise a plurality of characters, the languages of the field can be Chinese, english and the like, and the field can also contain a plurality of mixed characters.

In the embodiment provided by the application, the fields of the target document are converted into voice audio, specifically, the pronunciation of each word in the field is obtained by reading the meaning content of the fields in the target problem, and the pronunciation of each word in the field is connected in series to form the voice audio.

In the embodiment provided by the application, taking the type of the field as an example of Chinese, the characters in the field may have multi-tone characters, specifically, when the pronunciation audio is generated, the judgment is performed according to the overall meaning of the sub-field where the multi-tone characters are located, and the pronunciation of each character in the sub-field can be determined by identifying the meaning of the content of the sub-field.

When the field includes a plurality of language-mixed characters, taking the example that the field contains Chinese characters and English words, when the voice audio is generated based on the field, the Chinese characters are pronounciated in Chinese, and the English words are pronounciated in English.

In the embodiment provided by the application, the video template can be obtained from the database, and the duration of the video template is greater than or equal to the duration of the voice audio.

In the embodiment provided by the application, when the duration of the video template is longer than the duration of the voice audio, the video template can be clipped, so that the duration of the voice audio is the same as the duration of the video template.

In the embodiment provided by the application, the field of the target document can be inserted into a video template to serve as a caption of the video template, and then the voice audio is inserted into the video template to form the animation video, wherein when the animation video is played, the field serves as a display process of the caption to be synchronous with a play process of the voice audio.

When the field is inserted into a video template to serve as a subtitle of the video template, the field can be split into a plurality of subfields, and when the animation video is played, the subfields can be displayed one by one.

In the embodiment provided by the application, when the animation video is formed, the video template can be decoded to obtain a plurality of video frames, wherein the video frames have subtitle frames with insertable fields.

In the embodiment provided in the application, the video frame has an image frame into which an image can be inserted, and when the animated video is formed, the virtual explanation video generating device 100 may analyze fields of a target document to identify content meanings of the target document, query a database for an image related to the content meanings based on the content meanings of the target document, and insert the image into the image frame.

Specifically, the field has a plurality of subfields, each subfield corresponds to at least one video frame, when the image is inserted into the image frame, the content meaning of each subfield is specifically parsed, the image is queried from the database according to the content meaning, when the subfields correspond to a plurality of video frames, the virtual person explaining video generating device 100 may query a plurality of consecutive images associated with the content meaning of the subfields from the database, and insert a plurality of the images into the image frames of a plurality of video frames respectively, wherein the number of the consecutive images queried from the database may be the same as or different from the number of the video frames corresponding to the subfields; when the number of the plurality of consecutive images is larger than the number of the video frames corresponding to the subfields, the plurality of images can be simultaneously inserted into the image frames of one video frame, and when the number of the plurality of consecutive images is smaller than the number of the video frames corresponding to the subfields, the images can not be inserted into the corresponding partial video frames in the subfields.

In one possible implementation, when the animated video is formed, the background of each video frame in the video template may be removed, and then an image obtained by querying from a database is inserted into an image frame of each video frame, and the image is used as the background of the video frame.

In the embodiment provided by the application, the voice audio has a start time point and an end time point, the video template has a start time point and an end time point, when the voice audio is inserted into the animation video, the start time point and the end time point of the voice audio are respectively aligned with a start video frame and an end video frame of the video template, and at this time, the correspondence between the voice audio and a plurality of video frames can be determined.

In the embodiment provided in this application, the size of the subtitle frame of each video frame may be preset, the text size in the sub-field may be set manually, when the preset subtitle frame is inserted into the sub-field, the number of characters of the insertable sub-field in the subtitle frame is limited, for example, when the number of characters of the sub-field is 20, and when the number of characters of the subtitle frame of the video frame corresponding to the sub-field is limited to 15, the sub-field may be split into two sub-fields, the number of the two sub-fields may be less than 15, specifically, the sub-field with the number of characters of 20 may be split into the sub-field with the number of two characters of 10, or the sub-field with the number of characters of 20 may be split into the first sub-field with the number of characters of 15 and the second sub-field with the number of 5, and when the sub-field is split into the two sub-fields, the meaning of the sub-field is not originally changed.

Specifically, when the fields in the target document are split into a plurality of subfields, the pronunciation audio is correspondingly split into a plurality of sub-pronunciation audios, and when the starting time point and the ending time point of the voice audio are respectively aligned with the starting video frame and the ending video frame of the video template, the video frame corresponding to each subfield can be determined, and the video frame corresponding to each sub-pronunciation audio can be determined.

After the corresponding relation between each sub-field and the video frame in the video template is determined, each sub-field is respectively inserted into a subtitle frame in the video frame corresponding to each sub-field.

In the embodiment of the application, when the virtual person is formed, the virtual person can be formed by collecting the characteristic parameters of the character image and the standard character model parameters and then forming the virtual person based on the characteristic parameters of the character image and the standard character model parameters.

In the embodiment provided by the application, the voice audio comprises a plurality of pronunciations, and when the virtual person is fused to the animation video, the motion trail of the virtual person is synchronized with the voice audio so as to simulate the talking situation of the virtual person.

Specifically, when the virtual person simulates the situation of speaking, the lip of the virtual person needs to have a motion track, and in the embodiment provided by the application, the mouth shape of the virtual person corresponding to each pronunciation is determined according to each pronunciation of the voice audio.

And determining the lip movement track of the virtual person based on the mouth shape of the virtual person corresponding to each pronunciation, so that the movement track of the virtual person is synchronous with the voice frequency.

Referring to fig. 2, the embodiment of the present application further provides a virtual person explanation video generating apparatus 100, where the virtual person explanation video generating apparatus 100 includes:

a receiving module 110, configured to receive question information input by a user;

an obtaining module 120, configured to obtain, from a database, a target document related to the question information;

an animated video generating module 130, configured to generate an animated video based on the target document, where the animated video includes voice audio;

a virtual person forming module 140 for acquiring a character image and a standard character model from a database and forming a virtual person based on the character image and the standard character model;

and the fusion module 150 is used for fusing the virtual person into the animation video to form a virtual person explanation video.

The concepts related to the technical solutions provided in the embodiments of the present application, explanation, detailed description and other steps related to the apparatus refer to the foregoing methods or descriptions of the contents of the method steps performed by the apparatus in other embodiments, which are not repeated herein.

Referring to fig. 3, an electronic device according to an embodiment of the present application may include a processor 210, a storage 220, and a communication interface 230. The processor 210, the memory 220 and the communication interface 230 are connected by a bus 240, the memory 220 is configured to store instructions, and the processor 210 is configured to execute the instructions stored by the memory 220.

The processor 210 is configured to execute the instructions stored in the memory 220 to control the communication interface 230 to receive and transmit signals, thereby completing the steps in the method. The storage 220 may be integrated into the processor 210 or may be provided separately from the processor 210.

In one possible implementation, the functions of the communication interface 230 may be considered to be implemented by a transceiver circuit or a dedicated chip for transceiving. Processor 210 may be considered to be implemented by a dedicated processing chip, a processing circuit, a processor, or a general-purpose chip.

Embodiments of the present application also provide a computer storage medium storing computer instructions that when executed by a processor implement the above-described method.

In another possible implementation manner, a manner of using a general purpose computer may be considered to implement the apparatus provided in the embodiments of the present application. I.e. program code implementing the functions of the processor 210, the communication interface 230 is stored in the memory 220, and the general purpose processor implements the functions of the processor 210, the communication interface 230 by executing the code in the memory 220.

As another implementation of this embodiment, a computer program product is provided that contains instructions that, when executed, perform the method of the method embodiment described above.

Those skilled in the art will appreciate that there may be multiple processors and memories in an actual terminal or server. The storage may also be referred to as a storage medium or storage device, and embodiments of the present application are not limited in this regard.

It should be appreciated that in embodiments of the present application, the processor may be a central processing unit (Central Processing Unit, CPU for short), other general purpose processor, digital signal processor (Digital Signal Processing, DSP for short), application specific integrated circuit (Application Specific Integrated Circuit, ASIC for short), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA for short) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like.

It should also be understood that the storage referred to in embodiments of the present application may be volatile storage or non-volatile storage, or may include both volatile and non-volatile storage. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable Programmable ROM (EPROM), an electrically Erasable Programmable ROM (Electrically EPROM, EEPROM), or a flash Memory. The volatile storage may be random access memory (Random Access Memory, RAM for short) which acts as external cache memory. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate Synchronous DRAM (Double Data RateSDRAM, DDR SDRAM), enhanced Synchronous DRAM (ESDRAM), synchronous Link DRAM (SLDRAM), and direct memory bus RAM (Direct Rambus RAM, DRRAM).

Note that when the processor is a general-purpose processor, DSP, ASIC, FPGA or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, the memory (storage module) is integrated into the processor.

It should be noted that the reservoirs described herein are intended to comprise, without being limited to, these and any other suitable types of reservoirs.

The bus may include a power bus, a control bus, a status signal bus, and the like in addition to the data bus. But for clarity of illustration, the various buses are labeled as buses in the figures.

It should also be understood that the first, second, third, fourth, and various numerical numbers referred to herein are merely descriptive convenience and are not intended to limit the scope of the present application.

It should be understood that the term "and/or" is merely an association relationship describing the associated object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.

In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in the processor for execution. The software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads information from the memory and, in combination with its hardware, performs the steps of the method described above. To avoid repetition, a detailed description is not provided herein.

In various embodiments of the present application, the sequence number of each process does not mean the sequence of execution, and the execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.

Those of ordinary skill in the art will appreciate that the various illustrative logical blocks (illustrative logical block, abbreviated ILBs) and steps described in connection with the embodiments disclosed herein can be implemented in electronic hardware, or in combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.

The modules described as separate components may or may not be physically separate. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk), etc.

The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. The virtual person explanation video generation method is characterized by comprising the following steps of:

receiving questioning information input by a user;

acquiring a field of a target document; reading the meaning content of the field in the target document to obtain the pronunciation of each word in the field, and concatenating the pronunciation of each word in the field to form voice audio; obtaining a video template matched with the field from a database; decoding the video template to obtain a plurality of video frames, wherein the video frames are provided with subtitle frames capable of being inserted with fields; aligning the starting time point and the ending time point of the voice audio with the starting video frame and the ending video frame of the video template respectively so as to determine the corresponding relation between the voice audio and a plurality of video frames; splitting the fields to form a plurality of subfields, and determining the corresponding relation between the subfields and a plurality of video frames in the video template; inserting each sub-field into a subtitle frame in a video frame corresponding to the sub-field, and then inserting the voice audio into the video template; to form an animated video;

acquiring a character image and a standard character model from the database, and acquiring characteristic parameters and standard character model parameters of the character image; forming a virtual person based on the feature parameters and the standard character model parameters;

determining the mouth shape of the virtual person corresponding to each pronunciation according to each pronunciation of the voice audio; determining the lip movement track of the virtual person based on the mouth shape of the virtual person corresponding to each pronunciation; and synchronizing the motion trail of the lip of the virtual person with the voice audio, and fusing the virtual person into the animation video to form a virtual person explanation video for solving the questioning information.

2. The method for generating a virtual person interpretation video as claimed in claim 1, wherein said obtaining a target document associated with said question information from a database comprises:

extracting keywords from the meaning content;

3. The method for generating virtual reality interpretation video according to claim 2, wherein the obtaining a target document corresponding to the meaning content from a database based on the keyword comprises:

4. A virtual person interpretation video generating apparatus, comprising:

the animation video generation module is used for acquiring the fields of the target document; reading the meaning content of the field in the target document to obtain the pronunciation of each word in the field, and concatenating the pronunciation of each word in the field to form voice audio; obtaining a video template matched with the field from a database; decoding the video template to obtain a plurality of video frames, wherein the video frames are provided with subtitle frames capable of being inserted with fields; aligning the starting time point and the ending time point of the voice audio with the starting video frame and the ending video frame of the video template respectively so as to determine the corresponding relation between the voice audio and a plurality of video frames; splitting the fields to form a plurality of subfields, and determining the corresponding relation between the subfields and a plurality of video frames in the video template; inserting each sub-field into a subtitle frame in a video frame corresponding to the sub-field, and then inserting the voice audio into the video template; to form an animated video;

the virtual person forming module is used for acquiring a character image and a standard character model from the database and acquiring characteristic parameters and standard character model parameters of the character image; forming a virtual person based on the feature parameters and the standard character model parameters;

the fusion module is used for determining the mouth shape of the virtual person corresponding to each pronunciation according to each pronunciation of the voice audio; determining the lip movement track of the virtual person based on the mouth shape of the virtual person corresponding to each pronunciation; and synchronizing the motion trail of the lip of the virtual person with the voice audio, and fusing the virtual person into the animation video to form a virtual person explanation video for solving the questioning information.

5. An electronic device comprising a memory for storing computer instructions and a processor for invoking the computer instructions to perform the method of any of claims 1-3.

6. A computer storage medium storing computer instructions which, when executed by a processor, implement the method of any one of claims 1-3.