CN106713818A

CN106713818A - Speech processing system and method during video call

Info

Publication number: CN106713818A
Application number: CN201710093114.9A
Authority: CN
Inventors: 陈天武
Original assignee: Fujian Jiangxia University
Current assignee: Fujian Jiangxia University
Priority date: 2017-02-21
Filing date: 2017-02-21
Publication date: 2017-05-24

Abstract

The invention provides a speech processing system and method during a video call. Video call terminals are interconnected and intercommunicated via a basic communication network; a video call comprises an online server with an external enhancement call function; the online server with the external enhancement call function comprises an online speech-to-text conversion module and an online call atmosphere module; the online speech-to-text conversion module comprises a speech recognition unit; users communicate via the video call terminals; a local speech-to-text conversion module of a terminal or the speech recognition unit of the online speech-to-text conversion module processes opposite-party audio data, performs speech recognition, converts speech into a text, stories the text in a text-to-subtitle conversion and storage module, laminates the recognized text content into a video image of the terminal, and displays the text content; a local call atmosphere module of the terminal or the online call atmosphere module of the online server with the external enhancement call function is invoked; and according to the recognized text content, the whole call atmosphere is rendered into image and text effects, and the image and text effects are synthesized with the video image and then are displayed at the terminal in a rendering mode.

Description

Speech processing system and its method in video calling

Technical field

The present invention relates to speech processing system and method in a kind of video calling.

Background technology

With advances in technology, from letter, telegram, voice call develops into video electricity to the long-range communication way of person to person Words.Visual telephone needs simultaneous transmission video data and voice data, although audio, video data has a compression, but its data volume Still than many 1-2 orders of magnitude of voice only communication.Requirement of the video calling to basic network, the hardware configuration to terminal has Greatly improve.

Video calling is exactly that Voice ＆ Video is transmitted simultaneously, but technological progress, video calling can be allowed to carry in more Hold, improve the Consumer's Experience of video calling, increase the viscosity of user.

The content of the invention

It is an object of the invention to provide a kind of method and system of speech processes in video calling, for increasing to video calling Plus some characteristics, increase the interest of video calling, increase user's viscosity of video call function.

The present invention is realized using following technical scheme：

The system of speech processes in a kind of video calling, it is characterised in that：Lead to including hardware driving and operating system module, video Words middleware module, local voice turn character module, local call atmosphere module, word and turn captions memory module, text effects User's setup module, call atmosphere user setup module and outside enhancing call function line server；The outside enhancing is logical Words function line server includes online speech-to-text module and online call atmosphere module；Online speech-to-text module bag Include voice recognition unit；The video calling middleware module is used to receiving the audio, video data of other side's video calling, and by sound Video data is demultiplexed, and obtains video data and voice data；Local voice turns character module or online speech-to-text module By voice data, call voice to turn file interface, obtain the word content of user；Local call atmosphere module or atmosphere of conversing online The boxing block overall atmosphere that will converse is rendered to image, and with video image synthesis after render display in terminal.

The present invention also provides method of speech processing in a kind of video calling, it is characterised in that：Comprise the following steps：S1：Depending on Frequency call terminal is interconnected by Base communication net；One outside enhancing call function line server is provided；Outside enhancing is logical Words function line server includes online speech recognition server and online call atmosphere server；S2：User is logical by video Telephone terminal is conversed；Video calling middleware module receives the audio, video data of other side's video calling, and by audio, video data Demultiplexing, obtains video data and voice data；Character module or online speech-to-text mould are turned by the local voice of terminal The voice recognition unit of block carries out speech recognition to the voice data of other side, and reconvert is stored into word and turns captions storage in word Module, and the word content of identification is added on the video pictures of terminal is shown；S3：Call the local call atmosphere of terminal The online call atmosphere module of boxing block or outside enhancing call function server；According to the word content recognized in S2, will be logical The overall atmosphere of words is rendered to image and text effects, and with video image synthesis after render display in terminal.

Further, user chooses whether to call local or online call atmosphere module according to demand.

Further, the template that kinds of words is superimposed upon video pictures is previously stored with, is selected by user.

Further, the data communication process between video call terminal includes user authentication process.

Further, also including S4：When the online call atmosphere module that outside enhancing call function server is transferred in S3； Audio, video data is transferred to outside enhancing call function line server by terminal, after line server treatment, obtains word number According to atmosphere data, the audio, video data together with terminal is transferred to other side in the lump.

Compared with prior art, the present invention has advantages below：Extend use function (the voice turn text of video calling Word), increased user's viscosity of function；Enhance call atmosphere and render function (bonus effect of text importing), it is same to increase User's viscosity of function.

Brief description of the drawings

Fig. 1 is the overall construction drawing of speech processing system in video calling.

Fig. 2 is the nucleus module block diagram of speech processing system in video calling.

Fig. 3 is the operation sequence diagram of speech processes in video calling.

Specific embodiment

Explanation is further explained to the present invention with specific embodiment below in conjunction with the accompanying drawings.

As shown in figure 1, in video calling speech processing system overall construction drawing.Video call terminal passes through Base communication Net (internet etc.) interconnects.Line server of the video calling comprising outside enhancing call function, such as：Online speech recognition Server, online call atmosphere server.The division of server capability is divided on function logic, is drawn not from physical logic Point, i.e., online voice and video server and online call atmosphere server are likely to be present on same server host.

Video call terminal and online voice and video server and online call atmosphere server pass through Base communication net phase Connection, the data communication before them is two-way.Data communication process may include necessary user authentication process.

As shown in Fig. 2 in video calling speech processing system nucleus module block diagram.Speech processing system in video calling Turn character module, online voice including hardware driving and operating system module, video calling middleware module, local voice and turn text Word modules, word turn captions memory module, text effects user setup module, local call atmosphere module, atmosphere of conversing online Module and call atmosphere user's setup module；The video calling middleware module is used to receive the audio frequency and video of other side's video calling Data, and audio, video data is demultiplexed, obtain video data and voice data；Local voice turns character module or online voice Turn character module by voice data, call voice to turn file interface, obtain the word content of user；Local online call atmosphere mould Block or online call atmosphere the module overall atmosphere that will converse are rendered to image, and with video image synthesis after rendered in terminal it is aobvious Show.

The present invention also provides method of speech processing in a kind of video calling, and it is comprised the following steps：S1：Video call terminal Interconnected by Base communication net；One outside enhancing call function line server is provided；Outside enhancing call function is online Server includes online speech recognition server and online call atmosphere server；S2：User is carried out by video call terminal Call；Video calling middleware module receives the audio, video data of other side's video calling, and audio, video data is demultiplexed, and obtains Video data and voice data；Turn the speech recognition of character module or online speech-to-text module by the local voice of terminal Unit carries out speech recognition to the voice data of other side, and reconvert is stored into word and turns captions memory module in word, and will be known Other word content is added on the video pictures of terminal and is shown；S3：Call the local call atmosphere module of terminal or outer Portion strengthens the online call atmosphere module of call function server；According to the word content recognized in S2, by the overall atmosphere of call Be rendered to image and text effects, and with video image synthesis after render display in terminal.

As shown in figure 3, in video calling speech processes operation sequence diagram.Receive the audio, video data of other side's video calling Afterwards, speech-to-text and call atmosphere function can complete on the server, or video call terminal completion.It is specific to hand over Mutual process is shown in operation sequence diagram.

Above is presently preferred embodiments of the present invention, all changes made according to technical solution of the present invention, produced function work During with scope without departing from technical solution of the present invention, protection scope of the present invention is belonged to.

Claims

1. in a kind of video calling speech processes system, it is characterised in that：Including hardware driving and operating system module, video Call middleware module, local voice turn character module, local call atmosphere module, word and turn captions memory module, word effect Fruit user setup module, call atmosphere user setup module and outside enhancing call function line server；The outside enhancing Call function line server includes online speech-to-text module and online call atmosphere module；Online speech-to-text module Including voice recognition unit；The video calling middleware module is used to receive the audio, video data of other side's video calling, and will Audio, video data is demultiplexed, and obtains video data and voice data；Local voice turns character module or online speech-to-text mould Voice data is called voice to turn file interface by block, obtains the word content of user；Local call atmosphere module or online call The atmosphere module overall atmosphere that will converse is rendered to image, and with video image synthesis after render display in terminal.

2. method of speech processing in a kind of video calling, it is characterised in that：Comprise the following steps：

S1：Video call terminal is interconnected by Base communication net；One outside enhancing call function line server is provided；Outward The line server of portion's enhancing call function includes online speech recognition server and online call atmosphere server；

S2：User is conversed by video call terminal；The sound that video calling middleware module receives other side's video calling is regarded Frequency evidence, and audio, video data is demultiplexed, obtain video data and voice data；Word mould is turned by the local voice of terminal The voice recognition unit of block or online speech-to-text module carries out speech recognition to the voice data of other side, and reconvert is into word Storage turns captions memory module in word, and the word content of identification is added on the video pictures of terminal is shown；

S3：Call the local call atmosphere module of terminal or the online call atmosphere module of outside enhancing call function server； According to the word content recognized in S2, the overall atmosphere of call is rendered to image and text effects, and with video image synthesis after Display is rendered in terminal.

3. method of speech processing in video calling according to claim 2, it is characterised in that：User selects according to demand It is no to call local call atmosphere module or the outside online call atmosphere module for strengthening call function server.

4. method of speech processing in video calling according to claim 2, it is characterised in that：It is previously stored with kinds of words The template of video pictures is superimposed upon, is selected by user.

5. method of speech processing in video calling according to claim 2, it is characterised in that：Number between video call terminal User authentication process is included according to communication process, the data between video call terminal and outside enhancing call function line server are led to Letter process includes user authentication process.

6. method of speech processing in video calling according to claim 2, it is characterised in that：Also include S4：When tune in S3 Take the online call atmosphere module of outside enhancing call function server；Audio, video data is transferred to outside enhancing call by terminal Function line server, after line server treatment, obtains lteral data and atmosphere data, together with the audio, video data one of terminal And it is transferred to other side.