CN106713818A - Speech processing system and method during video call - Google Patents

Speech processing system and method during video call Download PDF

Info

Publication number
CN106713818A
CN106713818A CN201710093114.9A CN201710093114A CN106713818A CN 106713818 A CN106713818 A CN 106713818A CN 201710093114 A CN201710093114 A CN 201710093114A CN 106713818 A CN106713818 A CN 106713818A
Authority
CN
China
Prior art keywords
call
module
video
online
atmosphere
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710093114.9A
Other languages
Chinese (zh)
Inventor
陈天武
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujian Jiangxia University
Original Assignee
Fujian Jiangxia University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujian Jiangxia University filed Critical Fujian Jiangxia University
Priority to CN201710093114.9A priority Critical patent/CN106713818A/en
Publication of CN106713818A publication Critical patent/CN106713818A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention provides a speech processing system and method during a video call. Video call terminals are interconnected and intercommunicated via a basic communication network; a video call comprises an online server with an external enhancement call function; the online server with the external enhancement call function comprises an online speech-to-text conversion module and an online call atmosphere module; the online speech-to-text conversion module comprises a speech recognition unit; users communicate via the video call terminals; a local speech-to-text conversion module of a terminal or the speech recognition unit of the online speech-to-text conversion module processes opposite-party audio data, performs speech recognition, converts speech into a text, stories the text in a text-to-subtitle conversion and storage module, laminates the recognized text content into a video image of the terminal, and displays the text content; a local call atmosphere module of the terminal or the online call atmosphere module of the online server with the external enhancement call function is invoked; and according to the recognized text content, the whole call atmosphere is rendered into image and text effects, and the image and text effects are synthesized with the video image and then are displayed at the terminal in a rendering mode.

Description

Speech processing system and its method in video calling
Technical field
The present invention relates to speech processing system and method in a kind of video calling.
Background technology
With advances in technology, from letter, telegram, voice call develops into video electricity to the long-range communication way of person to person Words.Visual telephone needs simultaneous transmission video data and voice data, although audio, video data has a compression, but its data volume Still than many 1-2 orders of magnitude of voice only communication.Requirement of the video calling to basic network, the hardware configuration to terminal has Greatly improve.
Video calling is exactly that Voice & Video is transmitted simultaneously, but technological progress, video calling can be allowed to carry in more Hold, improve the Consumer's Experience of video calling, increase the viscosity of user.
The content of the invention
It is an object of the invention to provide a kind of method and system of speech processes in video calling, for increasing to video calling Plus some characteristics, increase the interest of video calling, increase user's viscosity of video call function.
The present invention is realized using following technical scheme:
The system of speech processes in a kind of video calling, it is characterised in that:Lead to including hardware driving and operating system module, video Words middleware module, local voice turn character module, local call atmosphere module, word and turn captions memory module, text effects User's setup module, call atmosphere user setup module and outside enhancing call function line server;The outside enhancing is logical Words function line server includes online speech-to-text module and online call atmosphere module;Online speech-to-text module bag Include voice recognition unit;The video calling middleware module is used to receiving the audio, video data of other side's video calling, and by sound Video data is demultiplexed, and obtains video data and voice data;Local voice turns character module or online speech-to-text module By voice data, call voice to turn file interface, obtain the word content of user;Local call atmosphere module or atmosphere of conversing online The boxing block overall atmosphere that will converse is rendered to image, and with video image synthesis after render display in terminal.
The present invention also provides method of speech processing in a kind of video calling, it is characterised in that:Comprise the following steps:S1:Depending on Frequency call terminal is interconnected by Base communication net;One outside enhancing call function line server is provided;Outside enhancing is logical Words function line server includes online speech recognition server and online call atmosphere server;S2:User is logical by video Telephone terminal is conversed;Video calling middleware module receives the audio, video data of other side's video calling, and by audio, video data Demultiplexing, obtains video data and voice data;Character module or online speech-to-text mould are turned by the local voice of terminal The voice recognition unit of block carries out speech recognition to the voice data of other side, and reconvert is stored into word and turns captions storage in word Module, and the word content of identification is added on the video pictures of terminal is shown;S3:Call the local call atmosphere of terminal The online call atmosphere module of boxing block or outside enhancing call function server;According to the word content recognized in S2, will be logical The overall atmosphere of words is rendered to image and text effects, and with video image synthesis after render display in terminal.
Further, user chooses whether to call local or online call atmosphere module according to demand.
Further, the template that kinds of words is superimposed upon video pictures is previously stored with, is selected by user.
Further, the data communication process between video call terminal includes user authentication process.
Further, also including S4:When the online call atmosphere module that outside enhancing call function server is transferred in S3; Audio, video data is transferred to outside enhancing call function line server by terminal, after line server treatment, obtains word number According to atmosphere data, the audio, video data together with terminal is transferred to other side in the lump.
Compared with prior art, the present invention has advantages below:Extend use function (the voice turn text of video calling Word), increased user's viscosity of function;Enhance call atmosphere and render function (bonus effect of text importing), it is same to increase User's viscosity of function.
Brief description of the drawings
Fig. 1 is the overall construction drawing of speech processing system in video calling.
Fig. 2 is the nucleus module block diagram of speech processing system in video calling.
Fig. 3 is the operation sequence diagram of speech processes in video calling.
Specific embodiment
Explanation is further explained to the present invention with specific embodiment below in conjunction with the accompanying drawings.
As shown in figure 1, in video calling speech processing system overall construction drawing.Video call terminal passes through Base communication Net (internet etc.) interconnects.Line server of the video calling comprising outside enhancing call function, such as:Online speech recognition Server, online call atmosphere server.The division of server capability is divided on function logic, is drawn not from physical logic Point, i.e., online voice and video server and online call atmosphere server are likely to be present on same server host.
Video call terminal and online voice and video server and online call atmosphere server pass through Base communication net phase Connection, the data communication before them is two-way.Data communication process may include necessary user authentication process.
As shown in Fig. 2 in video calling speech processing system nucleus module block diagram.Speech processing system in video calling Turn character module, online voice including hardware driving and operating system module, video calling middleware module, local voice and turn text Word modules, word turn captions memory module, text effects user setup module, local call atmosphere module, atmosphere of conversing online Module and call atmosphere user's setup module;The video calling middleware module is used to receive the audio frequency and video of other side's video calling Data, and audio, video data is demultiplexed, obtain video data and voice data;Local voice turns character module or online voice Turn character module by voice data, call voice to turn file interface, obtain the word content of user;Local online call atmosphere mould Block or online call atmosphere the module overall atmosphere that will converse are rendered to image, and with video image synthesis after rendered in terminal it is aobvious Show.
The present invention also provides method of speech processing in a kind of video calling, and it is comprised the following steps:S1:Video call terminal Interconnected by Base communication net;One outside enhancing call function line server is provided;Outside enhancing call function is online Server includes online speech recognition server and online call atmosphere server;S2:User is carried out by video call terminal Call;Video calling middleware module receives the audio, video data of other side's video calling, and audio, video data is demultiplexed, and obtains Video data and voice data;Turn the speech recognition of character module or online speech-to-text module by the local voice of terminal Unit carries out speech recognition to the voice data of other side, and reconvert is stored into word and turns captions memory module in word, and will be known Other word content is added on the video pictures of terminal and is shown;S3:Call the local call atmosphere module of terminal or outer Portion strengthens the online call atmosphere module of call function server;According to the word content recognized in S2, by the overall atmosphere of call Be rendered to image and text effects, and with video image synthesis after render display in terminal.
As shown in figure 3, in video calling speech processes operation sequence diagram.Receive the audio, video data of other side's video calling Afterwards, speech-to-text and call atmosphere function can complete on the server, or video call terminal completion.It is specific to hand over Mutual process is shown in operation sequence diagram.
Further, user chooses whether to call local or online call atmosphere module according to demand.
Further, the template that kinds of words is superimposed upon video pictures is previously stored with, is selected by user.
Further, the data communication process between video call terminal includes user authentication process.
Further, also including S4:When the online call atmosphere module that outside enhancing call function server is transferred in S3; Audio, video data is transferred to outside enhancing call function line server by terminal, after line server treatment, obtains word number According to atmosphere data, the audio, video data together with terminal is transferred to other side in the lump.
Above is presently preferred embodiments of the present invention, all changes made according to technical solution of the present invention, produced function work During with scope without departing from technical solution of the present invention, protection scope of the present invention is belonged to.

Claims (6)

1. in a kind of video calling speech processes system, it is characterised in that:Including hardware driving and operating system module, video Call middleware module, local voice turn character module, local call atmosphere module, word and turn captions memory module, word effect Fruit user setup module, call atmosphere user setup module and outside enhancing call function line server;The outside enhancing Call function line server includes online speech-to-text module and online call atmosphere module;Online speech-to-text module Including voice recognition unit;The video calling middleware module is used to receive the audio, video data of other side's video calling, and will Audio, video data is demultiplexed, and obtains video data and voice data;Local voice turns character module or online speech-to-text mould Voice data is called voice to turn file interface by block, obtains the word content of user;Local call atmosphere module or online call The atmosphere module overall atmosphere that will converse is rendered to image, and with video image synthesis after render display in terminal.
2. method of speech processing in a kind of video calling, it is characterised in that:Comprise the following steps:
S1:Video call terminal is interconnected by Base communication net;One outside enhancing call function line server is provided;Outward The line server of portion's enhancing call function includes online speech recognition server and online call atmosphere server;
S2:User is conversed by video call terminal;The sound that video calling middleware module receives other side's video calling is regarded Frequency evidence, and audio, video data is demultiplexed, obtain video data and voice data;Word mould is turned by the local voice of terminal The voice recognition unit of block or online speech-to-text module carries out speech recognition to the voice data of other side, and reconvert is into word Storage turns captions memory module in word, and the word content of identification is added on the video pictures of terminal is shown;
S3:Call the local call atmosphere module of terminal or the online call atmosphere module of outside enhancing call function server; According to the word content recognized in S2, the overall atmosphere of call is rendered to image and text effects, and with video image synthesis after Display is rendered in terminal.
3. method of speech processing in video calling according to claim 2, it is characterised in that:User selects according to demand It is no to call local call atmosphere module or the outside online call atmosphere module for strengthening call function server.
4. method of speech processing in video calling according to claim 2, it is characterised in that:It is previously stored with kinds of words The template of video pictures is superimposed upon, is selected by user.
5. method of speech processing in video calling according to claim 2, it is characterised in that:Number between video call terminal User authentication process is included according to communication process, the data between video call terminal and outside enhancing call function line server are led to Letter process includes user authentication process.
6. method of speech processing in video calling according to claim 2, it is characterised in that:Also include S4:When tune in S3 Take the online call atmosphere module of outside enhancing call function server;Audio, video data is transferred to outside enhancing call by terminal Function line server, after line server treatment, obtains lteral data and atmosphere data, together with the audio, video data one of terminal And it is transferred to other side.
CN201710093114.9A 2017-02-21 2017-02-21 Speech processing system and method during video call Pending CN106713818A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710093114.9A CN106713818A (en) 2017-02-21 2017-02-21 Speech processing system and method during video call

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710093114.9A CN106713818A (en) 2017-02-21 2017-02-21 Speech processing system and method during video call

Publications (1)

Publication Number Publication Date
CN106713818A true CN106713818A (en) 2017-05-24

Family

ID=58917095

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710093114.9A Pending CN106713818A (en) 2017-02-21 2017-02-21 Speech processing system and method during video call

Country Status (1)

Country Link
CN (1) CN106713818A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110415706A (en) * 2019-08-08 2019-11-05 常州市小先信息技术有限公司 A kind of technology and its application of superimposed subtitle real-time in video calling
CN112804440A (en) * 2019-11-13 2021-05-14 北京小米移动软件有限公司 Method, device and medium for processing image
US11044287B1 (en) 2020-11-13 2021-06-22 Microsoft Technology Licensing, Llc Caption assisted calling to maintain connection in challenging network conditions

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002165193A (en) * 2000-11-24 2002-06-07 Sharp Corp Visual telephone system
CN1741656A (en) * 2004-08-27 2006-03-01 乐金电子(中国)研究开发中心有限公司 Information service method for camera mobile telephone and apparatus thereof
CN1747546A (en) * 2004-09-07 2006-03-15 乐金电子(中国)研究开发中心有限公司 Device and method for providing video effect as communication between mobile communication terminals
CN104902212A (en) * 2015-04-30 2015-09-09 努比亚技术有限公司 Video communication method and apparatus
CN105244023A (en) * 2015-11-09 2016-01-13 上海语知义信息技术有限公司 System and method for reminding teacher emotion in classroom teaching
CN105260416A (en) * 2015-09-25 2016-01-20 百度在线网络技术(北京)有限公司 Voice recognition based searching method and apparatus
CN105530521A (en) * 2015-12-16 2016-04-27 广东欧珀移动通信有限公司 Streaming media searching method, device and system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002165193A (en) * 2000-11-24 2002-06-07 Sharp Corp Visual telephone system
CN1741656A (en) * 2004-08-27 2006-03-01 乐金电子(中国)研究开发中心有限公司 Information service method for camera mobile telephone and apparatus thereof
CN1747546A (en) * 2004-09-07 2006-03-15 乐金电子(中国)研究开发中心有限公司 Device and method for providing video effect as communication between mobile communication terminals
CN104902212A (en) * 2015-04-30 2015-09-09 努比亚技术有限公司 Video communication method and apparatus
CN105260416A (en) * 2015-09-25 2016-01-20 百度在线网络技术(北京)有限公司 Voice recognition based searching method and apparatus
CN105244023A (en) * 2015-11-09 2016-01-13 上海语知义信息技术有限公司 System and method for reminding teacher emotion in classroom teaching
CN105530521A (en) * 2015-12-16 2016-04-27 广东欧珀移动通信有限公司 Streaming media searching method, device and system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110415706A (en) * 2019-08-08 2019-11-05 常州市小先信息技术有限公司 A kind of technology and its application of superimposed subtitle real-time in video calling
CN112804440A (en) * 2019-11-13 2021-05-14 北京小米移动软件有限公司 Method, device and medium for processing image
US11044287B1 (en) 2020-11-13 2021-06-22 Microsoft Technology Licensing, Llc Caption assisted calling to maintain connection in challenging network conditions

Similar Documents

Publication Publication Date Title
US10810453B2 (en) Apparatus and method for reproducing handwritten message by using handwriting data
CN1333385C (en) Voice browser dialog enabler for a communication system
CN106713818A (en) Speech processing system and method during video call
KR20060121679A (en) Picture composing apparatus, commnunication terminal and picture communication system using the apparatus, and chatting server in the system
CN108141613A (en) Utilize the method and system of the video coding of post processing instruction
CN112543347A (en) Video super-resolution method and system based on machine vision coding and decoding
WO2014173286A1 (en) Method and apparatus for implementing a network transaction
CN106375942A (en) Method and device for transmission of data information
CN105718405B (en) The method that the USB interface of mobile terminal and its processor is multiplexed
CN101551998A (en) A group of voice interaction devices and method of voice interaction with human
CN107645598A (en) A kind of message display method and electronic equipment
CN108763350A (en) Text data processing method, device, storage medium and terminal
CN103345352A (en) Mobile terminal image rotating system and mobile terminal image rotating method
KR101510144B1 (en) System and method for advertisiing using background image
CN102364965A (en) Refined display method of mobile phone communication information
CN106682899A (en) Method for confirming online transaction safety through mobile phone and system thereof
CN101605166A (en) A kind of image processing method and portable terminal based on portable terminal
US20230342579A1 (en) Two-dimensional code generation method and related device
CN101119545B (en) Encoding label based information processing system and information processing method
CN206649899U (en) A kind of communicator for realizing real-time voice intertranslation
CN206249309U (en) A kind of talkback unit for realizing real-time voice intertranslation
CN1964469A (en) Mobile terminal
CN114945108A (en) Method and device for assisting vision-impaired person in understanding picture
CN107231629A (en) The account of mobile Internet application unbinds system for prompting and method with phone number
CN107870752A (en) Wall method, terminal, video wall and system on terminal window

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Wu Wenhuan

Inventor before: Chen Tianwu

CB03 Change of inventor or designer information
RJ01 Rejection of invention patent application after publication

Application publication date: 20170524

RJ01 Rejection of invention patent application after publication