CN101500127A

CN101500127A - Method for synchronously displaying subtitle in video telephone call

Info

Publication number: CN101500127A
Application number: CNA2008100569943A
Authority: CN
Inventors: 郭晓丹
Original assignee: TECHFAITH INTELLIGENT HANDSET TECHNOLOGY (BEIJING) Co Ltd
Current assignee: TECHFAITH INTELLIGENT HANDSET TECHNOLOGY (BEIJING) Co Ltd
Priority date: 2008-01-28
Filing date: 2008-01-28
Publication date: 2009-08-05

Abstract

The invention relates to a method for displaying captions synchronously in a video call process. The method is mainly realized by a voice recognition module, a caption processing module and an image synthesis module. In the video call process, the captions are generated by applying the voice recognition technology, and then the captions are superposed on a local terminal image according to caption display rules preset by a user and transmitted to a remote terminal user after processing. The increase of the caption display in the video call process is important compensation to the video call, and can improve the video call quality and enhance the communication effect. In addition, the captions generated by utilizing the method can be displayed on mobile phones supporting the video telephone function without increasing software procedures or hardware devices, thus being convenient and practical.

Description

The method of synchronously displaying subtitle in a kind of visual telephone

Technical field

The present invention relates to moving communicating field, concrete, the present invention relates to the method for synchronously displaying subtitle in the visual telephone.Use user of the present invention under the situation that does not influence normal talking, acoustic information to be changed into caption data, and send remote subscriber to after the local terminal image overlay.

Background technology

Along with the development of mobile communication technology, visual telephone business relies on its vividly user experience intuitively, has obtained promoting fast.Present visual telephone function mainly is video and the voice data by the collection both call sides, and abides by the agreement of arranging and transmit, thereby reaches the purpose of information interaction.But at present visual telephone is in communication process, and exchange way mainly still relies on verbal exposition, certainly will influence speech quality and communicative effect when sound transmits when unintelligible, even and have this moment video perception intuitively still can not satisfy the demand of communication.

Summary of the invention

The present invention aims to provide a kind of in video call process, can improve speech quality, and the effectively auxiliary method that exchanges can realize user voice information is changed into captions, and sends remote subscriber to after the local terminal image overlay.

To achieve these goals, basic thought of the present invention is in the video telephony call process, use speech recognition technology to generate subtitle file, captions configuration informations such as the viewing area of selecting according to the user, font, color and size again, the local terminal image that is added to sends remote subscriber to after merging with view data.This end subscriber and remote subscriber can see that the video of band captions shows simultaneously on display screen.Said process mainly uses sound identification module, captions processing module and image synthesis unit to finish.

Described visual telephone comprises both sides' conversation, MPTY and video conference.The present invention is that example describes with both sides' conversation only.

Described sound identification module will give an oral account language word for word be converted to corresponding literal, produce captions, and store the captions processing module into.Speech recognition technology can adopt software or hardware to discern according to the specific requirement of mobile phone.

Described captions processing module, according to the display mode that presets, the word that the identification of some is good is delivered to image synthesis unit.

The image synthesis unit of telling according to the requirement of captions configuration information, superposes captions and the background video of receiving, generates the video data stream of band captions.

Described captions configuration information comprises character script, line number, every capable number of words, text color, residence time, update time, captions viewing area and size etc.

In visual telephone, show the captions that user language carried out the character property explanation synchronously, be important supplement to video calling, speech quality can be improved, and the auxiliary effect that exchanges, improves communicative effect can be played.Because the variation of captions display format has also increased the flexibility and the interest of visual telephone.In addition, the present invention does not have specific (special) requirements for the equipment that receives captions, therefore supports the mobile phone of visual telephone function not need to increase extra software program or hardware device can be seen captions, and is convenient and practical.

Description of drawings

Fig. 1 is a structural representation of the present invention.

Fig. 2 is the functional flow diagram of the scheme two of captions display mode of the present invention.

Embodiment

Below in conjunction with accompanying drawing the specific embodiment of the present invention is described.

Fig. 1 is a structural representation of the present invention.As shown in the figure, method of the present invention is mainly by sound identification module 1, and captions processing module 2 and image synthesis unit 3 are finished.

Preferably, the captions display mode can adopt following two kinds of methods:

Method one: the memory block on the captions processing module promptly notifies image synthesis unit to carry out image overlay if renewal is arranged, and then the view data after the stack is carried out encoding and decoding handle, transmit and show.After the number of words in the captions viewing area satisfies captions requirement is set, promptly empty the viewing area, wait for the renewal of memory block.But do not upgrade if arrive the back storage area update time, then the captions viewing area also will all empty, and wait for next time and handle.

Method two: after the number of words in the captions memory block satisfies captions desired one group of captions are set, promptly notify image synthesis unit to superpose, and then to the view data after the stack carry out that encoding and decoding are handled, transmission and showing.But do not upgrade if arrive the back storage area update time, then the data with one group of captions of less than in the memory block superpose by image synthesis unit, and handle accordingly, transmit and show; Otherwise continue to wait for the memory block renewal.If the demonstration time of one group of captions has reached residence time, then empty the captions viewing area, continue to monitor the renewal of memory block.

In conjunction with Fig. 2, be example with the method two of captions display mode, specify the functional sequence of synchronously displaying subtitle of the present invention.

At first the user sets font, shows caption informations such as line number, every capable number of words, text color, residence time, update time, captions viewing area and size.

After visual telephone was communicated with, sound identification module 1 was monitored voice messaging in video calling, and word for word converts it to corresponding literal, stores the captions memory block of captions processing module 2 into.In the memory block, have captions to upgrade, and after the storage number of words satisfied the captions that preset desired one group of captions are set, captions processing module 2 just notified image synthesis unit 3 to begin stack work.Image synthesis unit 3 superposes captions and the local terminal video image of receiving according to the requirement of captions configuration information, generates the video data stream of band captions.And then to the view data after the stack encode wait to handle after, by the visual telephone host-host protocol Voice ﹠ Video data after encoding are carried out the multiplexing operation of Denging, be delivered to distal displayed.But do not upgrade if arrive the back storage area update time, then the captions with one group of less than in the memory block superpose by image synthesis unit 3, and handle accordingly, transmit and show; Otherwise continue to wait for the memory block renewal.If the demonstration time of one group of captions has reached residence time, then cancel the demonstration of these group captions, continue to monitor the renewal of memory block to carry out demonstration next time.

Below only be one embodiment of the present of invention, in order to restriction the present invention, within the spirit and principles in the present invention all, any modification of being done all is not included within the claim scope of the present invention.

Claims

1. the method for synchronously displaying subtitle in the visual telephone, it is characterized in that: use sound identification module (1), captions processing module (2) and image synthesis unit (3), in the video telephony call process, at first use speech recognition technology and generate captions, captions according to user preset show rule again, in the local terminal image, send subtitle superposition to remote subscriber after treatment.

2. sound identification module as claimed in claim 1 (1) is characterized in that the oral account language word for word is converted to corresponding literal, produces captions, and stores captions processing module (2) into.

3. sound identification module as claimed in claim 1 (1) is characterized in that and can according to circumstances select to use software or hardware identification technology.

4. captions processing module as claimed in claim 1 (2) is characterized in that according to the display mode that presets, and the captions of some are delivered to image synthesis unit (3).

5. image synthesis unit as claimed in claim 1 (3) is characterized in that the requirement according to the captions configuration information, and captions and the background video of receiving superposeed, and generates the video data stream of band captions.

6. captions configuration information as claimed in claim 5 is characterized in that comprising character script, line number, every capable number of words, text color, residence time, update time, captions viewing area and size etc.

7. captions display mode as claimed in claim 4 can adopt following method:

Memory block on the captions processing module promptly notifies image synthesis unit (3) to carry out image overlay if renewal is arranged, and then the view data after the stack is carried out encoding and decoding handle, transmit and show.After the number of words in the captions viewing area satisfies captions requirement is set, promptly empty the viewing area, wait for the renewal of memory block.But do not upgrade if arrive the back storage area update time, then the captions viewing area also will all empty, and wait for next time and handle.

8. captions display mode as claimed in claim 4 can adopt following method:

After the number of words in the captions memory block satisfies captions desired one group of captions is set, promptly notify image synthesis unit (3) to superpose, and then to the view data after the stack carry out that encoding and decoding are handled, transmission and showing.But do not upgrade if arrive the back storage area update time, then the data with one group of captions of less than in the memory block superpose by image synthesis unit (3), and handle accordingly, transmit and show; Otherwise continue to wait for the memory block renewal.If the demonstration time of one group of captions has reached residence time, then empty the captions viewing area, continue to monitor the renewal of memory block.