CN106713818A - Speech processing system and method during video call - Google Patents
Speech processing system and method during video call Download PDFInfo
- Publication number
- CN106713818A CN106713818A CN201710093114.9A CN201710093114A CN106713818A CN 106713818 A CN106713818 A CN 106713818A CN 201710093114 A CN201710093114 A CN 201710093114A CN 106713818 A CN106713818 A CN 106713818A
- Authority
- CN
- China
- Prior art keywords
- call
- module
- video
- online
- atmosphere
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/141—Systems for two-way working between two video terminals, e.g. videophone
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Telephonic Communication Services (AREA)
Abstract
The invention provides a speech processing system and method during a video call. Video call terminals are interconnected and intercommunicated via a basic communication network; a video call comprises an online server with an external enhancement call function; the online server with the external enhancement call function comprises an online speech-to-text conversion module and an online call atmosphere module; the online speech-to-text conversion module comprises a speech recognition unit; users communicate via the video call terminals; a local speech-to-text conversion module of a terminal or the speech recognition unit of the online speech-to-text conversion module processes opposite-party audio data, performs speech recognition, converts speech into a text, stories the text in a text-to-subtitle conversion and storage module, laminates the recognized text content into a video image of the terminal, and displays the text content; a local call atmosphere module of the terminal or the online call atmosphere module of the online server with the external enhancement call function is invoked; and according to the recognized text content, the whole call atmosphere is rendered into image and text effects, and the image and text effects are synthesized with the video image and then are displayed at the terminal in a rendering mode.
Description
Technical field
The present invention relates to speech processing system and method in a kind of video calling.
Background technology
With advances in technology, from letter, telegram, voice call develops into video electricity to the long-range communication way of person to person
Words.Visual telephone needs simultaneous transmission video data and voice data, although audio, video data has a compression, but its data volume
Still than many 1-2 orders of magnitude of voice only communication.Requirement of the video calling to basic network, the hardware configuration to terminal has
Greatly improve.
Video calling is exactly that Voice & Video is transmitted simultaneously, but technological progress, video calling can be allowed to carry in more
Hold, improve the Consumer's Experience of video calling, increase the viscosity of user.
The content of the invention
It is an object of the invention to provide a kind of method and system of speech processes in video calling, for increasing to video calling
Plus some characteristics, increase the interest of video calling, increase user's viscosity of video call function.
The present invention is realized using following technical scheme:
The system of speech processes in a kind of video calling, it is characterised in that:Lead to including hardware driving and operating system module, video
Words middleware module, local voice turn character module, local call atmosphere module, word and turn captions memory module, text effects
User's setup module, call atmosphere user setup module and outside enhancing call function line server;The outside enhancing is logical
Words function line server includes online speech-to-text module and online call atmosphere module;Online speech-to-text module bag
Include voice recognition unit;The video calling middleware module is used to receiving the audio, video data of other side's video calling, and by sound
Video data is demultiplexed, and obtains video data and voice data;Local voice turns character module or online speech-to-text module
By voice data, call voice to turn file interface, obtain the word content of user;Local call atmosphere module or atmosphere of conversing online
The boxing block overall atmosphere that will converse is rendered to image, and with video image synthesis after render display in terminal.
The present invention also provides method of speech processing in a kind of video calling, it is characterised in that:Comprise the following steps:S1:Depending on
Frequency call terminal is interconnected by Base communication net;One outside enhancing call function line server is provided;Outside enhancing is logical
Words function line server includes online speech recognition server and online call atmosphere server;S2:User is logical by video
Telephone terminal is conversed;Video calling middleware module receives the audio, video data of other side's video calling, and by audio, video data
Demultiplexing, obtains video data and voice data;Character module or online speech-to-text mould are turned by the local voice of terminal
The voice recognition unit of block carries out speech recognition to the voice data of other side, and reconvert is stored into word and turns captions storage in word
Module, and the word content of identification is added on the video pictures of terminal is shown;S3:Call the local call atmosphere of terminal
The online call atmosphere module of boxing block or outside enhancing call function server;According to the word content recognized in S2, will be logical
The overall atmosphere of words is rendered to image and text effects, and with video image synthesis after render display in terminal.
Further, user chooses whether to call local or online call atmosphere module according to demand.
Further, the template that kinds of words is superimposed upon video pictures is previously stored with, is selected by user.
Further, the data communication process between video call terminal includes user authentication process.
Further, also including S4:When the online call atmosphere module that outside enhancing call function server is transferred in S3;
Audio, video data is transferred to outside enhancing call function line server by terminal, after line server treatment, obtains word number
According to atmosphere data, the audio, video data together with terminal is transferred to other side in the lump.
Compared with prior art, the present invention has advantages below:Extend use function (the voice turn text of video calling
Word), increased user's viscosity of function;Enhance call atmosphere and render function (bonus effect of text importing), it is same to increase
User's viscosity of function.
Brief description of the drawings
Fig. 1 is the overall construction drawing of speech processing system in video calling.
Fig. 2 is the nucleus module block diagram of speech processing system in video calling.
Fig. 3 is the operation sequence diagram of speech processes in video calling.
Specific embodiment
Explanation is further explained to the present invention with specific embodiment below in conjunction with the accompanying drawings.
As shown in figure 1, in video calling speech processing system overall construction drawing.Video call terminal passes through Base communication
Net (internet etc.) interconnects.Line server of the video calling comprising outside enhancing call function, such as:Online speech recognition
Server, online call atmosphere server.The division of server capability is divided on function logic, is drawn not from physical logic
Point, i.e., online voice and video server and online call atmosphere server are likely to be present on same server host.
Video call terminal and online voice and video server and online call atmosphere server pass through Base communication net phase
Connection, the data communication before them is two-way.Data communication process may include necessary user authentication process.
As shown in Fig. 2 in video calling speech processing system nucleus module block diagram.Speech processing system in video calling
Turn character module, online voice including hardware driving and operating system module, video calling middleware module, local voice and turn text
Word modules, word turn captions memory module, text effects user setup module, local call atmosphere module, atmosphere of conversing online
Module and call atmosphere user's setup module;The video calling middleware module is used to receive the audio frequency and video of other side's video calling
Data, and audio, video data is demultiplexed, obtain video data and voice data;Local voice turns character module or online voice
Turn character module by voice data, call voice to turn file interface, obtain the word content of user;Local online call atmosphere mould
Block or online call atmosphere the module overall atmosphere that will converse are rendered to image, and with video image synthesis after rendered in terminal it is aobvious
Show.
The present invention also provides method of speech processing in a kind of video calling, and it is comprised the following steps:S1:Video call terminal
Interconnected by Base communication net;One outside enhancing call function line server is provided;Outside enhancing call function is online
Server includes online speech recognition server and online call atmosphere server;S2:User is carried out by video call terminal
Call;Video calling middleware module receives the audio, video data of other side's video calling, and audio, video data is demultiplexed, and obtains
Video data and voice data;Turn the speech recognition of character module or online speech-to-text module by the local voice of terminal
Unit carries out speech recognition to the voice data of other side, and reconvert is stored into word and turns captions memory module in word, and will be known
Other word content is added on the video pictures of terminal and is shown;S3:Call the local call atmosphere module of terminal or outer
Portion strengthens the online call atmosphere module of call function server;According to the word content recognized in S2, by the overall atmosphere of call
Be rendered to image and text effects, and with video image synthesis after render display in terminal.
As shown in figure 3, in video calling speech processes operation sequence diagram.Receive the audio, video data of other side's video calling
Afterwards, speech-to-text and call atmosphere function can complete on the server, or video call terminal completion.It is specific to hand over
Mutual process is shown in operation sequence diagram.
Further, user chooses whether to call local or online call atmosphere module according to demand.
Further, the template that kinds of words is superimposed upon video pictures is previously stored with, is selected by user.
Further, the data communication process between video call terminal includes user authentication process.
Further, also including S4:When the online call atmosphere module that outside enhancing call function server is transferred in S3;
Audio, video data is transferred to outside enhancing call function line server by terminal, after line server treatment, obtains word number
According to atmosphere data, the audio, video data together with terminal is transferred to other side in the lump.
Above is presently preferred embodiments of the present invention, all changes made according to technical solution of the present invention, produced function work
During with scope without departing from technical solution of the present invention, protection scope of the present invention is belonged to.
Claims (6)
1. in a kind of video calling speech processes system, it is characterised in that:Including hardware driving and operating system module, video
Call middleware module, local voice turn character module, local call atmosphere module, word and turn captions memory module, word effect
Fruit user setup module, call atmosphere user setup module and outside enhancing call function line server;The outside enhancing
Call function line server includes online speech-to-text module and online call atmosphere module;Online speech-to-text module
Including voice recognition unit;The video calling middleware module is used to receive the audio, video data of other side's video calling, and will
Audio, video data is demultiplexed, and obtains video data and voice data;Local voice turns character module or online speech-to-text mould
Voice data is called voice to turn file interface by block, obtains the word content of user;Local call atmosphere module or online call
The atmosphere module overall atmosphere that will converse is rendered to image, and with video image synthesis after render display in terminal.
2. method of speech processing in a kind of video calling, it is characterised in that:Comprise the following steps:
S1:Video call terminal is interconnected by Base communication net;One outside enhancing call function line server is provided;Outward
The line server of portion's enhancing call function includes online speech recognition server and online call atmosphere server;
S2:User is conversed by video call terminal;The sound that video calling middleware module receives other side's video calling is regarded
Frequency evidence, and audio, video data is demultiplexed, obtain video data and voice data;Word mould is turned by the local voice of terminal
The voice recognition unit of block or online speech-to-text module carries out speech recognition to the voice data of other side, and reconvert is into word
Storage turns captions memory module in word, and the word content of identification is added on the video pictures of terminal is shown;
S3:Call the local call atmosphere module of terminal or the online call atmosphere module of outside enhancing call function server;
According to the word content recognized in S2, the overall atmosphere of call is rendered to image and text effects, and with video image synthesis after
Display is rendered in terminal.
3. method of speech processing in video calling according to claim 2, it is characterised in that:User selects according to demand
It is no to call local call atmosphere module or the outside online call atmosphere module for strengthening call function server.
4. method of speech processing in video calling according to claim 2, it is characterised in that:It is previously stored with kinds of words
The template of video pictures is superimposed upon, is selected by user.
5. method of speech processing in video calling according to claim 2, it is characterised in that:Number between video call terminal
User authentication process is included according to communication process, the data between video call terminal and outside enhancing call function line server are led to
Letter process includes user authentication process.
6. method of speech processing in video calling according to claim 2, it is characterised in that:Also include S4:When tune in S3
Take the online call atmosphere module of outside enhancing call function server;Audio, video data is transferred to outside enhancing call by terminal
Function line server, after line server treatment, obtains lteral data and atmosphere data, together with the audio, video data one of terminal
And it is transferred to other side.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710093114.9A CN106713818A (en) | 2017-02-21 | 2017-02-21 | Speech processing system and method during video call |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710093114.9A CN106713818A (en) | 2017-02-21 | 2017-02-21 | Speech processing system and method during video call |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106713818A true CN106713818A (en) | 2017-05-24 |
Family
ID=58917095
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710093114.9A Pending CN106713818A (en) | 2017-02-21 | 2017-02-21 | Speech processing system and method during video call |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106713818A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110415706A (en) * | 2019-08-08 | 2019-11-05 | 常州市小先信息技术有限公司 | A kind of technology and its application of superimposed subtitle real-time in video calling |
CN112804440A (en) * | 2019-11-13 | 2021-05-14 | 北京小米移动软件有限公司 | Method, device and medium for processing image |
US11044287B1 (en) | 2020-11-13 | 2021-06-22 | Microsoft Technology Licensing, Llc | Caption assisted calling to maintain connection in challenging network conditions |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002165193A (en) * | 2000-11-24 | 2002-06-07 | Sharp Corp | Visual telephone system |
CN1741656A (en) * | 2004-08-27 | 2006-03-01 | 乐金电子(中国)研究开发中心有限公司 | Information service method for camera mobile telephone and apparatus thereof |
CN1747546A (en) * | 2004-09-07 | 2006-03-15 | 乐金电子(中国)研究开发中心有限公司 | Device and method for providing video effect as communication between mobile communication terminals |
CN104902212A (en) * | 2015-04-30 | 2015-09-09 | 努比亚技术有限公司 | Video communication method and apparatus |
CN105244023A (en) * | 2015-11-09 | 2016-01-13 | 上海语知义信息技术有限公司 | System and method for reminding teacher emotion in classroom teaching |
CN105260416A (en) * | 2015-09-25 | 2016-01-20 | 百度在线网络技术(北京)有限公司 | Voice recognition based searching method and apparatus |
CN105530521A (en) * | 2015-12-16 | 2016-04-27 | 广东欧珀移动通信有限公司 | Streaming media searching method, device and system |
-
2017
- 2017-02-21 CN CN201710093114.9A patent/CN106713818A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002165193A (en) * | 2000-11-24 | 2002-06-07 | Sharp Corp | Visual telephone system |
CN1741656A (en) * | 2004-08-27 | 2006-03-01 | 乐金电子(中国)研究开发中心有限公司 | Information service method for camera mobile telephone and apparatus thereof |
CN1747546A (en) * | 2004-09-07 | 2006-03-15 | 乐金电子(中国)研究开发中心有限公司 | Device and method for providing video effect as communication between mobile communication terminals |
CN104902212A (en) * | 2015-04-30 | 2015-09-09 | 努比亚技术有限公司 | Video communication method and apparatus |
CN105260416A (en) * | 2015-09-25 | 2016-01-20 | 百度在线网络技术(北京)有限公司 | Voice recognition based searching method and apparatus |
CN105244023A (en) * | 2015-11-09 | 2016-01-13 | 上海语知义信息技术有限公司 | System and method for reminding teacher emotion in classroom teaching |
CN105530521A (en) * | 2015-12-16 | 2016-04-27 | 广东欧珀移动通信有限公司 | Streaming media searching method, device and system |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110415706A (en) * | 2019-08-08 | 2019-11-05 | 常州市小先信息技术有限公司 | A kind of technology and its application of superimposed subtitle real-time in video calling |
CN112804440A (en) * | 2019-11-13 | 2021-05-14 | 北京小米移动软件有限公司 | Method, device and medium for processing image |
US11044287B1 (en) | 2020-11-13 | 2021-06-22 | Microsoft Technology Licensing, Llc | Caption assisted calling to maintain connection in challenging network conditions |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10810453B2 (en) | Apparatus and method for reproducing handwritten message by using handwriting data | |
CN1333385C (en) | Voice browser dialog enabler for a communication system | |
CN106713818A (en) | Speech processing system and method during video call | |
KR20060121679A (en) | Picture composing apparatus, commnunication terminal and picture communication system using the apparatus, and chatting server in the system | |
CN108141613A (en) | Utilize the method and system of the video coding of post processing instruction | |
CN112543347A (en) | Video super-resolution method and system based on machine vision coding and decoding | |
WO2014173286A1 (en) | Method and apparatus for implementing a network transaction | |
CN106375942A (en) | Method and device for transmission of data information | |
CN105718405B (en) | The method that the USB interface of mobile terminal and its processor is multiplexed | |
CN101551998A (en) | A group of voice interaction devices and method of voice interaction with human | |
CN107645598A (en) | A kind of message display method and electronic equipment | |
CN108763350A (en) | Text data processing method, device, storage medium and terminal | |
CN103345352A (en) | Mobile terminal image rotating system and mobile terminal image rotating method | |
KR101510144B1 (en) | System and method for advertisiing using background image | |
CN102364965A (en) | Refined display method of mobile phone communication information | |
CN106682899A (en) | Method for confirming online transaction safety through mobile phone and system thereof | |
CN101605166A (en) | A kind of image processing method and portable terminal based on portable terminal | |
US20230342579A1 (en) | Two-dimensional code generation method and related device | |
CN101119545B (en) | Encoding label based information processing system and information processing method | |
CN206649899U (en) | A kind of communicator for realizing real-time voice intertranslation | |
CN206249309U (en) | A kind of talkback unit for realizing real-time voice intertranslation | |
CN1964469A (en) | Mobile terminal | |
CN114945108A (en) | Method and device for assisting vision-impaired person in understanding picture | |
CN107231629A (en) | The account of mobile Internet application unbinds system for prompting and method with phone number | |
CN107870752A (en) | Wall method, terminal, video wall and system on terminal window |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information |
Inventor after: Wu Wenhuan Inventor before: Chen Tianwu |
|
CB03 | Change of inventor or designer information | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170524 |
|
RJ01 | Rejection of invention patent application after publication |