CN110493613B - Video lip synchronization synthesis method and system - Google Patents

Video lip synchronization synthesis method and system Download PDF

Info

Publication number
CN110493613B
CN110493613B CN201910758080.XA CN201910758080A CN110493613B CN 110493613 B CN110493613 B CN 110493613B CN 201910758080 A CN201910758080 A CN 201910758080A CN 110493613 B CN110493613 B CN 110493613B
Authority
CN
China
Prior art keywords
lip
video
codes
cloud server
prototype
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910758080.XA
Other languages
Chinese (zh)
Other versions
CN110493613A (en
Inventor
郭志扬
乔健
吴鹏程
陈起航
朱西锋
丁航
陆佳莉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Aoxin Technology Co Ltd
Original Assignee
Jiangsu Aoxin Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Aoxin Technology Co Ltd filed Critical Jiangsu Aoxin Technology Co Ltd
Priority to CN201910758080.XA priority Critical patent/CN110493613B/en
Publication of CN110493613A publication Critical patent/CN110493613A/en
Application granted granted Critical
Publication of CN110493613B publication Critical patent/CN110493613B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/236Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
    • H04N21/2368Multiplexing of audio and video streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/242Synchronization processes, e.g. processing of PCR [Program Clock References]

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention discloses a method and a system for synthesizing video lip synchronization, and belongs to the technical field of lip synchronization. The method specifically comprises the following steps: the cloud server receives the pronunciation manuscript through the terminal equipment and splits the manuscript into a plurality of sentences according to punctuation marks; the cloud server performs permutation and combination on each split sentence according to different lips, and matches the lips with the prototype video: splicing each sentence of prototype video successfully matched to form a synthesized video; calculating the playing time of the formed composite video; the cloud server sets the speed of speech according to time through the pronunciation manuscript received by the terminal equipment, and ensures that the pronunciation time length is equal to the character playing time length. The invention sets different codes for different lip shapes according to lip shape combination when characters on the pronunciation manuscript are pronounced, selects the prototype video corresponding to the lip shapes and synthesizes, thereby achieving the effect of ensuring the lip shapes to be consistent with the sound factors while the character picture is played in voice and increasing the authenticity.

Description

Video lip synchronization synthesis method and system
Technical Field
The invention belongs to the technical field of sound lip synchronization, and particularly relates to a video sound lip synchronization synthesis method and system.
Background
In order to enhance communication and exchange with customers and quasi-customers and provide better products and technical services for the customers, a plurality of merchants or organizations are specially provided with own customer service and after-sales technical service departments, the workload of the workers of the departments for communicating with the customers on line or off line every day is large, repeated and complicated question answering and guidance are needed, the users cannot be served on line or on duty every 24 hours, and the virtual real-person robot can be transported as soon as possible. That is, a large amount of real person videos and answering voices are stored in the display screen, and corresponding feedback is given to questions of customers.
However, in the answering process, the lip shape of the person in the video and the answering voice are synthesized in the later period, so that the lip shape of the person in the video is not synchronous with the voice, characters heard by the client are not matched with the lip shape, and the effect of face-to-face communication with real person customer service is not achieved, so that the client can be mentally excluded from the service.
Disclosure of Invention
The present invention provides a method and a system for synthesizing video lip synchronization that provides a sense of reality to people in order to solve the technical problems in the background art.
The invention is realized by the following technical scheme: a video lip synchronization synthesis method is characterized in that various lip prototype video files suitable for a virtual robot are stored in a cloud server;
the video lip synchronization synthesis method specifically comprises the following steps:
step 1: the cloud server receives the pronunciation manuscript through the terminal equipment and splits the manuscript into a plurality of sentences according to punctuation marks;
step 2: the cloud server performs permutation and combination on each split sentence according to different lips, and matches the lips with the prototype video:
and step 3: splicing each sentence of prototype video successfully matched to form a synthesized video;
and 4, step 4: calculating the playing time of the synthesized video formed in the step 3;
and 5: the cloud server sets the speed of speech according to the time of step 4, the pronunciation duration is equal to the text playing duration, the text is sent to the voice gateway, and the voice gateway converts the text into a sound file and sends the sound file back to the cloud server;
step 6: synthesizing the synthesized video generated in the step 3 and the sound generated in the step 5 to form a final synthesized video;
and 7: and (4) playing the synthesized video generated in the step (6) through a specified terminal, and exiting the system.
In a further embodiment, the step 2 specifically includes the following steps:
step 2.1: converting each Chinese character in the sentence into pinyin, setting lip codes to be 1 when consonants pronounce without closing lips according to vowels of the pinyin, setting lip codes to be 2 when the consonants pronounce while closing lips, setting lip codes to be 3 when the consonants pronounce while closing lips, setting lip codes to be 4 when the lip codes are large, setting lip codes to be 5 when the consonants pronounce while closing lips according to vowels, and setting lip codes to be 6 when the lip codes are large, thereby obtaining a string of lip permutation codes of the sentence;
step 2.2: searching and acquiring a prototype video with identical or similar lip-shaped arrangement codes in a prototype video library, wherein the lip-shaped codes of the last word in a sentence are required to be identical;
step 2.3: if the finding is found, turning to the step 3;
step 2.4: if the lip shape arranged codes with close lip shapes do not exist in the prototype video library, the lip shape arranged codes are subjected to limited splitting until prototype videos with the same or close lip shapes are found in each segment after splitting, the lip shape codes of the last word of a sentence are required to be equal, the prototype videos are spliced into sentence videos, and the step 3 is carried out;
step 2.5: if the lip-shaped equivalent or similar prototype video cannot be found after the limited splitting, the report system adds the prototype video supplementing the lip-shaped arrangement code, the matching fails, and the report system exits.
A uses the above-mentioned a video lip synchronous synthetic system, the terminal station of the robot, is used for receiving the question voice of the customer, and send and formate the video;
the cloud server is used for receiving the question voice sent by the robot terminal through the Internet, feeding back a corresponding synthesized video to the robot terminal through the Internet according to the question voice, and playing the synthesized video by the robot terminal;
in a further embodiment, the cloud server comprises: the device comprises a processor, a recording unit, a touch display unit, a communication unit and a lip arrangement unit, wherein the processor is respectively connected with the recording unit, the touch display unit, the communication unit and the lip arrangement unit;
the recording unit is used for acquiring the question voice of the client; the touch display unit is used for customer operation and video playing; the communication unit is used for carrying out data transmission with the cloud server; the lip shape arrangement unit is used for corresponding to the arrangement combination of different lip shapes of each sentence of characters and endowing each prototype video file with different lip shape arrangement combination codes, and the lip shape arrangement codes comprise: when consonants are sounded, lip codes are set to 1 when lip is slightly opened, and are set to 2 when lip is greatly opened, when vowels are sounded, lip codes are set to 3 when lip is slightly opened, and are set to 4 when lip is greatly opened, and when vowels are sounded, lip codes are set to 5 when lip is slightly opened, and are set to 6 when lip is greatly opened.
In a further embodiment, the cloud server comprises:
the receiving and pushing module is used for receiving the data sent by the robot terminal and sending the data to the robot terminal;
the voice conversion module is used for converting the question voice received from the cloud server through the Internet into question words and feeding the question words back to the cloud server; meanwhile, the pronunciation manuscript received on the cloud server through the Internet is converted into answer voice and is fed back to the cloud server through the Internet;
the matching module is used for matching the questioning words with corresponding answer voice or answer video from a question bank in a cloud server;
and the storage module is used for storing questioning voice, answering voice, pronunciation manuscript, synthesized video and key words of the client.
The invention has the beneficial effects that: different codes are set for different lips according to lip combination during pronunciation of characters on the pronunciation manuscript, prototype videos corresponding to the lips are selected and synthesized, the effect that the lips are consistent with sound when people pictures are played in voice is achieved, and authenticity is improved.
Drawings
Fig. 1 is a flow chart of a video lip synchronization synthesizing method.
Fig. 2 is a block flow diagram of step 2 in fig. 1.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Although the steps in the present invention are arranged by using reference numbers, the order of the steps is not limited, and the relative order of the steps can be adjusted unless the order of the steps is explicitly stated or other steps are required for the execution of a certain step. It is to be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.
The applicant aims at the problems existing in the existing service industry: the lip shape of the person in the video and the answering voice are synthesized in a later period, so that the lip shape of the person in the video is not synchronous with the voice, characters heard by a client are not matched with the lip shape, the effect of face-to-face communication with real person customer service is not achieved, and the client can be prevented from mentally rejecting the service.
Therefore, in order to solve the above technical problems, the applicant designs a real person online help machine service system, and a video lip synchronization synthesis method and system which can improve the reality of the system.
First, various lip-shaped prototype video files suitable for the virtual robot are stored in the cloud server.
As shown in fig. 1, the method for synthesizing video lip synchronization specifically includes the following steps:
step 1: the cloud server receives the pronunciation manuscript through the terminal equipment and splits the manuscript into a plurality of sentences according to punctuation marks;
step 2: the cloud server performs permutation and combination on each split sentence according to different lips, and matches the lips with the prototype video:
and step 3: splicing each sentence of prototype video successfully matched to form a synthesized video;
and 4, step 4: calculating the playing time of the synthesized video formed in the step 3;
and 5: the cloud server sets the speed of speech according to the time of step 4, the pronunciation duration is equal to the text playing duration, the text is sent to the voice gateway, and the voice gateway converts the text into a sound file and sends the sound file back to the cloud server;
step 6: synthesizing the synthesized video generated in the step 3 and the sound generated in the step 5 to form a final synthesized video;
and 7: and (4) playing the synthesized video generated in the step (6) through a specified terminal, and exiting the system.
As shown in fig. 2, the step 2 specifically includes the following steps:
step 2.1: converting each Chinese character in the sentence into pinyin, setting lip codes to be 1 when consonants pronounce without closing lips according to vowels of the pinyin, setting lip codes to be 2 when the consonants pronounce while closing lips, setting lip codes to be 3 when the consonants pronounce while closing lips, setting lip codes to be 4 when the lip codes are large, setting lip codes to be 5 when the consonants pronounce while closing lips according to vowels, and setting lip codes to be 6 when the lip codes are large, thereby obtaining a string of lip permutation codes of the sentence;
step 2.2: searching and acquiring a prototype video with identical or similar lip-shaped arrangement codes in a prototype video library, wherein the lip-shaped codes of the last word in a sentence are required to be identical;
step 2.3: if the finding is found, turning to the step 3;
step 2.4: if the lip shape arranged codes with close lip shapes do not exist in the prototype video library, the lip shape arranged codes are subjected to limited splitting until prototype videos with the same or close lip shapes are found in each segment after splitting, the lip shape codes of the last word of a sentence are required to be equal, the prototype videos are spliced into sentence videos, and the step 3 is carried out;
step 2.5: if the lip-shaped equivalent or similar prototype video cannot be found after the limited splitting, the report system adds the prototype video supplementing the lip-shaped arrangement code, the matching fails, and the report system exits.
A video lip-synchronized compositing system, comprising: the robot terminal is used for receiving the questioning voice of the client and sending the synthesized video;
the cloud server is used for receiving the question voice sent by the robot terminal through the Internet, feeding back a corresponding synthesized video to the robot terminal through the Internet according to the question voice, and playing the synthesized video by the robot terminal;
4. the system according to claim 3, wherein the cloud server comprises: the device comprises a processor, a recording unit, a touch display unit, a communication unit and a lip arrangement unit, wherein the processor is respectively connected with the recording unit, the touch display unit, the communication unit and the lip arrangement unit;
the recording unit is used for acquiring the question voice of the client; the touch display unit is used for customer operation and video playing; the communication unit is used for carrying out data transmission with the cloud server; the lip shape arrangement unit is used for corresponding to the arrangement combination of different lip shapes of each sentence of characters and endowing each prototype video file with different lip shape arrangement combination codes, and the lip shape arrangement codes comprise: when consonants are sounded, lip codes are set to 1 when lip is slightly opened, and are set to 2 when lip is greatly opened, when vowels are sounded, lip codes are set to 3 when lip is slightly opened, and are set to 4 when lip is greatly opened, and when vowels are sounded, lip codes are set to 5 when lip is slightly opened, and are set to 6 when lip is greatly opened.
The cloud server comprises: the receiving and pushing module is used for receiving the data sent by the robot terminal and sending the data to the robot terminal; the voice conversion module is used for converting the question voice received from the cloud server through the Internet into question words and feeding the question words back to the cloud server; meanwhile, the pronunciation manuscript received on the cloud server through the Internet is converted into answer voice and is fed back to the cloud server through the Internet; the matching module is used for matching the questioning words with corresponding answer voice or answer video from a question bank in a cloud server; and the storage module is used for storing questioning voice, answering voice, pronunciation manuscript, synthesized video and key words of the client.
The video is synthesized by matching the answer voice, so that the answer voice and the synthesized video can be played simultaneously when the virtual robot demonstrates, the consistency of the sound effect and the picture is kept, the pronunciation of the sound effect and the lip shape on the picture reach a high degree, and the watching comfort level of a client is improved.
Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.

Claims (3)

1. A video lip synchronization synthesis method is characterized in that prototype video files of various lips suitable for a virtual robot are stored in a cloud server;
the video lip synchronization synthesis method specifically comprises the following steps:
step 1: the cloud server receives the pronunciation manuscript through the terminal equipment and splits the manuscript into a plurality of sentences according to punctuation marks;
step 2: the cloud server carries out permutation and combination on each split sentence according to different lips, and matches the lips with the prototype video;
and step 3: splicing each sentence of prototype video successfully matched to form a synthesized video;
and 4, step 4: calculating the playing time of the synthesized video formed in the step 3;
and 5: the cloud server sets the speed of speech according to the time of step 4, the pronunciation duration is equal to the text playing duration, the text is sent to the voice gateway, and the voice gateway converts the text into a sound file and sends the sound file back to the cloud server;
step 6: synthesizing the synthesized video generated in the step 3 and the sound generated in the step 5 to form a final synthesized video;
and 7: playing the synthesized video generated in the step 6 through a specified terminal, and exiting the system;
the step 2 specifically comprises the following steps:
step 2.1: converting each Chinese character in the sentence into pinyin, setting lip codes to be 1 when consonants pronounce without closing lips according to vowels of the pinyin, setting lip codes to be 2 when the consonants pronounce while closing lips, setting lip codes to be 3 when the consonants pronounce while closing lips, setting lip codes to be 4 when the lip codes are large, setting lip codes to be 5 when the consonants pronounce while closing lips according to vowels, and setting lip codes to be 6 when the lip codes are large, thereby obtaining a string of lip permutation codes of the sentence;
step 2.2: searching and acquiring a prototype video with identical or similar lip-shaped arrangement codes in a prototype video library, wherein the lip-shaped codes of the last word in a sentence are required to be identical;
step 2.3: if the finding is found, turning to the step 3;
step 2.4: if the lip shape arranged codes with close lip shapes do not exist in the prototype video library, the lip shape arranged codes are subjected to limited splitting until prototype videos with the same or close lip shapes are found in each segment after splitting, the lip shape codes of the last word of a sentence are required to be equal, the prototype videos are spliced into sentence videos, and the step 3 is carried out;
step 2.5: if the lip-shaped equivalent or similar prototype video cannot be found after the limited splitting, the report system adds the prototype video supplementing the lip-shaped arrangement code, the matching fails, and the report system exits.
2. A video lip-sync synthesizing system using a video lip-sync synthesizing method according to claim 1, comprising: the robot terminal is used for receiving the questioning voice of the client and sending the synthesized video;
the cloud server is used for receiving the question voice sent by the robot terminal through the Internet, feeding back a corresponding synthesized video to the robot terminal through the Internet according to the question voice, and playing the synthesized video by the robot terminal;
the cloud server comprises: the device comprises a processor, a recording unit, a touch display unit, a communication unit and a lip arrangement unit, wherein the processor is respectively connected with the recording unit, the touch display unit, the communication unit and the lip arrangement unit;
the recording unit is used for acquiring the question voice of the client; the touch display unit is used for customer operation and video playing; the communication unit is used for carrying out data transmission with the cloud server; the lip shape arrangement unit is used for corresponding to the arrangement combination of different lip shapes of each sentence of characters and endowing each prototype video file with different lip shape arrangement combination codes, and the lip shape arrangement codes comprise: when consonants are sounded, lip codes are set to 1 when lip is slightly opened, and are set to 2 when lip is greatly opened, when vowels are sounded, lip codes are set to 3 when lip is slightly opened, and are set to 4 when lip is greatly opened, and when vowels are sounded, lip codes are set to 5 when lip is slightly opened, and are set to 6 when lip is greatly opened.
3. The system of claim 2, wherein the cloud server comprises:
the receiving and pushing module is used for receiving the data sent by the robot terminal and sending the data to the robot terminal;
the voice conversion module is used for converting the question voice received from the cloud server through the Internet into question words and feeding the question words back to the cloud server; meanwhile, the pronunciation manuscript received on the cloud server through the Internet is converted into answer voice and is fed back to the cloud server through the Internet;
the matching module is used for matching the questioning words with corresponding answer voice or answer video from a question bank in a cloud server;
and the storage module is used for storing questioning voice, answering voice, pronunciation manuscript, synthesized video and key words of the client.
CN201910758080.XA 2019-08-16 2019-08-16 Video lip synchronization synthesis method and system Active CN110493613B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910758080.XA CN110493613B (en) 2019-08-16 2019-08-16 Video lip synchronization synthesis method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910758080.XA CN110493613B (en) 2019-08-16 2019-08-16 Video lip synchronization synthesis method and system

Publications (2)

Publication Number Publication Date
CN110493613A CN110493613A (en) 2019-11-22
CN110493613B true CN110493613B (en) 2020-05-19

Family

ID=68551356

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910758080.XA Active CN110493613B (en) 2019-08-16 2019-08-16 Video lip synchronization synthesis method and system

Country Status (1)

Country Link
CN (1) CN110493613B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111325817B (en) * 2020-02-04 2023-07-18 清华珠三角研究院 Virtual character scene video generation method, terminal equipment and medium
CN111225237B (en) 2020-04-23 2020-08-21 腾讯科技(深圳)有限公司 Sound and picture matching method of video, related device and storage medium
CN113178206B (en) * 2021-04-22 2022-05-31 内蒙古大学 AI (Artificial intelligence) composite anchor generation method, electronic equipment and readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101482975A (en) * 2008-01-07 2009-07-15 丰达软件(苏州)有限公司 Method and apparatus for converting words into animation
CN101796812A (en) * 2006-03-31 2010-08-04 莱切技术国际公司 Lip synchronization system and method
CN106791539A (en) * 2016-12-26 2017-05-31 国家新闻出版广电总局电影数字节目管理中心 A kind of storage of film digital program and extracting method
CN108010531A (en) * 2017-12-14 2018-05-08 南京美桥信息科技有限公司 A kind of visible intelligent inquiry method and system
CN108038206A (en) * 2017-12-14 2018-05-15 南京美桥信息科技有限公司 A kind of visible intelligent method of servicing and system
CN108090170A (en) * 2017-12-14 2018-05-29 南京美桥信息科技有限公司 A kind of intelligence inquiry method for recognizing semantics and visible intelligent interrogation system

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100396091C (en) * 2006-04-03 2008-06-18 北京和声创景音频技术有限公司 Commandos dubbing system and dubbing making method thereof
CN100476877C (en) * 2006-11-10 2009-04-08 中国科学院计算技术研究所 Generating method of cartoon face driven by voice and text together
CN107786889A (en) * 2017-11-13 2018-03-09 北海威德电子科技有限公司 Can synchronous sign language interpreter DTV
CN109308731B (en) * 2018-08-24 2023-04-25 浙江大学 Speech driving lip-shaped synchronous face video synthesis algorithm of cascade convolution LSTM
CN109637518B (en) * 2018-11-07 2022-05-24 北京搜狗科技发展有限公司 Virtual anchor implementation method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101796812A (en) * 2006-03-31 2010-08-04 莱切技术国际公司 Lip synchronization system and method
CN101482975A (en) * 2008-01-07 2009-07-15 丰达软件(苏州)有限公司 Method and apparatus for converting words into animation
CN106791539A (en) * 2016-12-26 2017-05-31 国家新闻出版广电总局电影数字节目管理中心 A kind of storage of film digital program and extracting method
CN108010531A (en) * 2017-12-14 2018-05-08 南京美桥信息科技有限公司 A kind of visible intelligent inquiry method and system
CN108038206A (en) * 2017-12-14 2018-05-15 南京美桥信息科技有限公司 A kind of visible intelligent method of servicing and system
CN108090170A (en) * 2017-12-14 2018-05-29 南京美桥信息科技有限公司 A kind of intelligence inquiry method for recognizing semantics and visible intelligent interrogation system

Also Published As

Publication number Publication date
CN110493613A (en) 2019-11-22

Similar Documents

Publication Publication Date Title
Crystal The language revolution
CN110493613B (en) Video lip synchronization synthesis method and system
CN110033659B (en) Remote teaching interaction method, server, terminal and system
US7913155B2 (en) Synchronizing method and system
CN110405791B (en) Method and system for simulating and learning speech by robot
US20180203830A1 (en) Synchronized consumption modes for e-books
WO2018108013A1 (en) Medium displaying method and terminal
CN104735480B (en) Method for sending information and system between mobile terminal and TV
CN111866529A (en) Method and system for hybrid use of virtual real person during video live broadcast
US7613613B2 (en) Method and system for converting text to lip-synchronized speech in real time
US11968433B2 (en) Systems and methods for generating synthetic videos based on audio contents
CN109326151A (en) Implementation method, client and server based on semantics-driven virtual image
CN114793300A (en) Virtual video customer service robot synthesis method and system based on generation countermeasure network
CN113850898A (en) Scene rendering method and device, storage medium and electronic equipment
CN112447073A (en) Explanation video generation method, explanation video display method and device
CN111160051B (en) Data processing method, device, electronic equipment and storage medium
US20160247500A1 (en) Content delivery system
Kadam et al. A Survey of Audio Synthesis and Lip-syncing for Synthetic Video Generation
KR20100115003A (en) Method for generating talking heads from text and system thereof
KR101675049B1 (en) Global communication system
CN109902311A (en) A kind of synchronous English of video signal and multilingual translation system
CN108174123A (en) Data processing method, apparatus and system
US20240153397A1 (en) Virtual meeting coaching with content-based evaluation
US20240153398A1 (en) Virtual meeting coaching with dynamically extracted content
CN111580614A (en) Wearable intelligent device and sign language learning method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant