CN102036051A

CN102036051A - Method and device for prompting in video meeting

Info

Publication number: CN102036051A
Application number: CN2010105962090A
Authority: CN
Inventors: 戴华波; 王海涛
Original assignee: Huawei Device Co Ltd
Current assignee: Huawei Device Co Ltd
Priority date: 2010-12-20
Filing date: 2010-12-20
Publication date: 2011-04-27

Abstract

The embodiment of the invention discloses a method and a device for prompting in a video meeting. The method for prompting in a video meeting comprises the following steps of: acquiring at least one promoting caption picture by a video meeting terminal; editing the at least one promoting caption picture, and caching the edited promoting caption pictures in a memory of a local meeting place terminal according to a specified picture display sequence; receiving a prompting instruction; and displaying the edited prompting caption picture in a rolling mode according to the specified picture display sequence in the memory in a specified area of a far-end meeting video displayed by the local meeting place terminal according to the prompting instruction. The technical scheme provided by the embodiment of the invention is beneficial to the reduction of the implementation cost and the complexity of prompting function in the video meeting and the improvement of the flexibility of meeting place deployment.

Description

Realize the method and the device of prompter in the video conference

Technical field

The present invention relates to communication technical field, be specifically related to realize the method and the device of prompter in the video conference.

Background technology

Along with the differentiation of user's request and the development of video conferencing technology, video conferencing technology more and more widely be applied in academic exchange, several scenes such as long-distance education, commercial affairs consultation, summit forum.

Under a lot of scenes, the spokesman of conference participation may need carry out corresponding speech based on certain text of a statement or speech.See the text of a statement or speech for making the spokesman to bow, and, exchange with far-end participant's meeting that existing a lot of conference systems are all considered the function that prompter is provided for the spokesman to strengthen directly in the face of the camera speech.

In the prior art the implementation of prompter function commonly used be, deployment-specific prompter equipment (comprising a special display that shows prompter information) is participant's prompter, conference terminal is by the real-time prompter of software control prompter equipment.

Put into practice and find, have now and utilize special-purpose prompter equipment to carry out in the technology of prompter, need to increase extra hardware and software kit and realize prompter, it is embodied as originally higher relatively, and management is complexity relatively, and the function expansion is also relatively poor relatively; And the installation site of the display of prompter directly influences the prompter effect, the meeting-place is disposed be subjected to many restrictions.

Summary of the invention

The embodiment of the invention provides method and the device of realizing prompter in the video conference, to reduce the realization cost and the complexity of prompter function in the video conference, improves the meeting-place and disposes flexibility.

For solving the problems of the technologies described above, the embodiment of the invention provides following technical scheme:

A kind of method that realizes prompter in the video conference comprises:

Video conference terminal obtains at least one prompter captions picture;

Described at least one prompter captions picture is carried out editing and processing, and will be through the described prompter captions image cache of editing and processing in the video memory of described video conference terminal according to the picture DISPLAY ORDER of appointment;

Reception prompter instruction;

According to described prompter instruction, the appointed area of the far-end meeting video that shows at described video conference terminal, according to the picture DISPLAY ORDER of appointment in the video memory, roll display is through the described prompter captions picture of editing and processing.

A kind of method that realizes prompter in the video conference comprises:

Video conference terminal obtains the prompter subtitle file;

Reception prompter instruction;

Sampling spokesman audio frequency;

Spokesman's audio frequency to described sampling carries out speech recognition, obtains the described spokesman's audio frequency word information relates with sampling;

With that obtain and described spokesman's audio frequency word information relates sampling, the prompter caption information that comprises with the prompter subtitle file mates;

According to matching result, the appointed area of the far-end meeting video that shows at described video conference terminal, show in the prompter caption information that described prompter subtitle file comprises, with the be complementary next part prompter caption information of part of described spokesman's audio frequency word information relates of current sampling.

A kind of video conference terminal comprises:

Acquisition module is used to obtain at least one prompter captions picture;

Handle cache module, at least one the prompter captions picture that is used for described acquisition module is obtained carries out editing and processing, and will be through the described prompter captions image cache of editing and processing in described video conference terminal video memory according to the picture DISPLAY ORDER of appointment;

Receiver module is used to receive the prompter instruction;

The roll display module, the prompter that is used for receiving according to described receiver module is instructed, the appointed area of the far-end meeting video that shows at described video conference terminal, according to the picture DISPLAY ORDER of appointment in the video memory, roll display is through the described prompter captions picture of editing and processing.

A kind of video conference terminal comprises:

Second acquisition module is used to obtain the prompter subtitle file;

Receiver module is used to receive the prompter instruction;

Sampling module, spokesman's audio frequency is used to sample;

Sound identification module is used for spokesman's audio frequency of described sampling module sampling is carried out speech recognition, obtains the described spokesman's audio frequency word information relates with sampling;

Matching module, that be used for described sound identification module is obtained and described spokesman's audio frequency word information relates sampling, the prompter caption information that comprises with the prompter subtitle file mates;

Display control module, be used for matching result according to described matching module, the appointed area of the far-end meeting video that shows at described video conference terminal, show in the prompter caption information that described prompter subtitle file comprises, with the be complementary next part prompter caption information of part of described spokesman's audio frequency word information relates of current sampling.

Therefore, in a kind of scheme that the embodiment of the invention provides, directly obtain the prompter captions picture that comprises the required prompter information of speech by video conference terminal, and will be in the video conference terminal video memory through the prompter captions image cache of editing and processing by named order, after receiving the prompter instruction, appointed area at far-end meeting video, pass through the prompter captions picture of editing and processing according to the picture DISPLAY ORDER roll display of appointment in the video memory, owing to be to be that handle on the basis directly, handle complexity and can suitably reduce with the prompter captions picture that comprises the required prompter information of speech; Owing to introduced roll display mechanism, at the appointed area of far-end meeting video roll display prompter captions picture, help in clear demonstration prompter captions, not having influence on normally watching of far-end meeting video, and then promote meeting experience; And, owing to can utilize the intrinsic hardware resource of conference terminal to realize the prompter function, can reduce the hardware implementation cost and the system complexity of prompter function in the video conference, improve the meeting-place and dispose flexibility.

In the another kind of scheme that the embodiment of the invention provides, directly obtain the prompter subtitle file that comprises the required prompter information of speech by video conference terminal, after receiving the prompter instruction, sampling spokesman audio frequency; Spokesman's audio frequency to sampling carries out speech recognition, obtains the spokesman's audio frequency word information relates with sampling; With that obtain and this spokesman's audio frequency word information relates sampling, the prompter caption information that comprises with the prompter subtitle file mates; According to matching result, the appointed area of the far-end meeting video that shows at video conference terminal, show in the prompter caption information that this prompter subtitle file comprises, with the be complementary next part prompter caption information of part of spokesman's audio frequency word information relates of current sampling.Owing to introduce the audio identification technology and according to the real-time roll display mechanism of spokesman's voice, at the appointed area of far-end meeting video roll display prompter captions, can realize real-time prompter automatically, and help in clear demonstration prompter captions, do not have influence on normally watching of far-end meeting video, bigger lifting meeting is experienced; And, owing to can utilize the intrinsic hardware resource of conference terminal to realize the prompter function, can reduce the hardware implementation cost and the system complexity of prompter function in the video conference, improve the meeting-place and dispose flexibility.

Description of drawings

In order to be illustrated more clearly in the technical scheme in the embodiment of the invention, the accompanying drawing of required use is done to introduce simply in will describing embodiment below, apparently, accompanying drawing in describing below only is some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain other accompanying drawing according to these accompanying drawings.

Fig. 1 is a kind of method flow schematic diagram of realizing prompter in the video conference that the embodiment of the invention one provides;

Fig. 2 is a kind of method flow schematic diagram of realizing prompter in the video conference that the embodiment of the invention two provides;

Fig. 3-a is that a kind of that the embodiment of the invention two provides preserves prompter captions picture schematic diagram by page or leaf;

Fig. 3-b is a kind of schematic diagram based on spokesman's word speed rolling prompter that the embodiment of the invention two provides;

Fig. 3-c is the address redirect schematic diagram that the embodiment of the invention two provides the stack picture;

Fig. 3-d is a kind of prompter captions picture cutting that provides of the embodiment of the invention two and the schematic diagram of the local video that is added to;

Fig. 4 is that the embodiment of the invention three provides a kind of video conference terminal schematic diagram;

Fig. 5 is a kind of method flow schematic diagram of realizing prompter in the video conference that the embodiment of the invention four provides;

Fig. 6 is that the embodiment of the invention five provides a kind of video conference terminal schematic diagram.

Embodiment

The embodiment of the invention provides a kind of method and device of realizing prompter in the video conference, can reduce the realization cost and the complexity of prompter function in the video conference, improves the meeting-place and disposes flexibility.

For make goal of the invention of the present invention, feature, advantage can be more obvious and understandable, below in conjunction with the accompanying drawing in the embodiment of the invention, technical scheme in the embodiment of the invention is clearly and completely described, obviously, the embodiments described below only are the present invention's part embodiment, but not whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that is obtained under the creative work prerequisite.

Embodiment one

The present invention realizes an embodiment of the method for prompter in the video conference, can comprise: video conference terminal obtains at least one prompter captions picture; Above-mentioned at least one prompter captions picture is carried out editing and processing, and will be through this prompter captions image cache of editing and processing in the video memory of video conference terminal according to the picture DISPLAY ORDER of appointment; Reception prompter instruction; According to above-mentioned prompter instruction, the appointed area of the far-end meeting video that shows at video conference terminal, according to the picture DISPLAY ORDER of appointment in the video memory, roll display is through the above-mentioned prompter captions picture of editing and processing.

Referring to Fig. 1, concrete steps can comprise:

110, video conference terminal obtains at least one prompter captions picture;

In actual applications, the mode that video conference terminal obtains prompter captions picture (wherein comprising the prompter information that the spokesman makes a speech required) can be diversified, for example conference terminal both can obtain prompter captions picture from the outside, also can oneself generate and obtain prompter captions picture, for instance, video conference terminal can receive the prompter captions picture that comprises the prompter information that the spokesman makes a speech required of miscellaneous equipment input by video input interface, can be connected with a document camera as video conference terminal, the prompter information that text video camera is made a speech the spokesman required is taken into prompter captions picture, and the prompter captions picture that will take passes to video conference terminal; Perhaps, video conference terminal also can receive the prompter captions picture that comprises the prompter information that the spokesman makes a speech required of PC or the Internet input, or, video conference terminal can be after obtaining the prompter information that the spokesman makes a speech required, generates the prompter captions picture that comprises the prompter information that the spokesman makes a speech required.Video conference terminal can be kept at the prompter captions picture that obtains in its storage medium (as internal memory).

Be appreciated that, because (for example between each prompter row, each prompter section, each prompter page or leaf) has certain logic association and sequencing between all prompter information that the spokesman makes a speech required, therefore if get access to a plurality of prompter captions pictures that comprise the prompter information that the spokesman makes a speech required, then also can specify a sequencing between these a plurality of prompter captions pictures, video conference terminal can be numbered it sequentially, and deposit by page or leaf, so that show successively when showing.

120, video conference terminal carries out editing and processing to above-mentioned at least one prompter captions picture, and will be through the described prompter captions image cache of editing and processing in the video memory of video conference terminal according to the picture DISPLAY ORDER of appointment;

Under a kind of application scenarios, prompter captions picture is carried out editing and processing can be comprised: prompter captions picture is carried out editing and processing, and (α information can indicate the transparency of picture for band α information, wherein, α=0 expression full impregnated is bright, α=1 expression all standing) prompter captions picture, so that according to spokesman's indication, information such as the color of modification output prompter captions, background, background color, specifically can be before needs sending stack to show, prompter captions picture is carried out conversion process such as transparent, background color.Wherein, prompter captions picture is carried out editing and processing can also can be comprised: prompter captions picture format size and the form that configuration shows are mated, for example if the form that prompter captions picture and configuration show does not match, then can carry out convergent-divergent to prompter captions picture, for example, controllable levels is consistent with display format during convergent-divergent, vertically scale can with horizontal scaling than identical, in order to avoid captions anamorphose.

Video conference terminal carries out editing and processing to prompter captions picture, and can will be in the video memory of video conference terminal through the prompter captions image cache of editing and processing according to the picture DISPLAY ORDER of appointment, so that the follow-up prompter captions picture that can directly read carries out the picture demonstration, realize skipping automatically function from the video memory of video conference terminal.Wherein, picture DISPLAY ORDER herein promptly is meant the sequencing between the prompter information, if prompter information is with behavior unit, then picture DISPLAY ORDER herein can refer to the sequencing between each prompter row, if prompter information is unit with the section, then this picture DISPLAY ORDER can refer to the sequencing between each prompter section, if prompter information is unit with the page or leaf, then this picture DISPLAY ORDER can refer to the sequencing between each prompter page or leaf, by that analogy.

130, video conference terminal receives the prompter instruction;

In the video conference process, if the spokesman need give prompter and make a speech, the spokesman can send the prompter instruction to video conference terminal, and video conference terminal starts the prompter function after receiving the prompter instruction.

Be appreciated that, above-mentioned for example with

step

110 and 120 before step 130, be implemented as the example, certainly it also can be carried out after step 130, be that video conference terminal is after receiving the prompter instruction, obtain prompter captions picture again, it is carried out editing and processing, and will be through the prompter captions image cache of editing and processing in video memory according to the picture DISPLAY ORDER of appointment.

140, video conference terminal instructs according to above-mentioned prompter, the appointed area of the far-end meeting video that shows at this video conference terminal, and according to the picture DISPLAY ORDER of appointment in the video memory, roll display is through the above-mentioned prompter captions picture of editing and processing.

In actual applications, video conference terminal can be provided with prompter stack window in the appointed area on the far-end meeting video of this locality demonstration, and can enable sequential according to the prompter stack window generation stack that is provided with, concrete which row that corresponds to every frame far-end meeting video of specifying, which row allows stack prompter captions picture; Can directly show delegation or multirow prompter captions at prompter stack window, and the mode by roll display, then continuable all prompter captions that demonstrate because the window that prompter can be superposeed is set enough for a short time, just can not have influence on normally watching of far-end meeting video yet.Wherein, because video conference terminal is by specified order, to leave in the video conference terminal video memory through the prompter captions picture of editing and processing, video conference terminal can be by operating the demonstration of the speech captions picture of realizing the appointed area to the initial address of stack video memory, add up or tired subtracting by stack video memory address, just can realize the roll display of prompter captions picture.

Under a kind of application scenarios, video conference terminal can carry out rolling rate control to roll display prompter captions picture, for example based on manual mode or automatic mode, video conference terminal can be based on predetermined rolling rate, and roll display is through the prompter captions picture of editing and processing; Perhaps, can be based on the rolling rate that is complementary with spokesman's word speed, roll display is through the prompter captions picture of editing and processing; Perhaps, can be according to spokesman's roll display control command, roll display is through the prompter captions picture of editing and processing, and certainly, video conference terminal can be based on other mechanism, roll display prompter captions picture.

Further, video conference terminal can also be cut into polylith with the prompter captions picture that current scrolling shows and (can cut according to specific size, perhaps, speech literal that can be current according to the spokesman, that part of cutting of the prompter captions picture of spokesman's current speaking literal institute correspondence position is got off), and, obtain local overlay video with the appointed area of its local meeting video that is added to; Should this locality overlay video encode and send to far-end video conference terminal (the far-end video conference terminal refers to the one or more video conference terminals of other except that video conference terminal in the active conference), this this locality overlay video can directly send to the far-end video conference terminal, perhaps can be undertaken being transmitted to the far-end video conference terminal after the respective handling by intermediate equipment, far-end video conference far-end then can demonstrate spokesman's speech content when showing the speech video.Like this, can realize presenting in real time the function of the content of making a speech, also can save the work of backstage editor's captions to other participant.

Further, video conference terminal also can show the cue mark (this cue mark can be icon, literal or other form) of prompter captions picture current scrolling progress displaying, so that the spokesman understands the progress of current speaking in real time, also remain the content etc. of how much making a speech.

Therefore, directly obtain the prompter captions picture that comprises the required prompter information of speech by video conference terminal in the present embodiment, and editing and processing is obtained prompter captions image cache in the video memory of video conference terminal by named order, after receiving the prompter instruction, appointed area at far-end meeting video, picture DISPLAY ORDER roll display prompter captions picture according to appointment in the video memory, owing to be to be that handle on the basis directly, handle complexity and can suitably reduce with the prompter captions picture that comprises the required prompter information of speech; Owing to introduced roll display mechanism, at the appointed area of far-end meeting video roll display prompter captions picture, help in clear demonstration prompter captions, not having influence on normally watching of far-end meeting video, and then promote meeting experience; And, owing to can utilize the intrinsic hardware resource of conference terminal to realize the prompter function, can reduce the hardware implementation cost and the system complexity of prompter function in the video conference, improve the meeting-place and dispose flexibility.

Embodiment two

For ease of better understanding the technical scheme of the embodiment of the invention, with the example of more specifically giving an example the technical scheme of the embodiment of the invention is carried out more detailed description below.

Referring to Fig. 2, concrete steps can comprise:

201, video conference terminal obtains the prompter captions picture that at least one comprises prompter information.

In actual applications, the mode that video conference terminal obtains prompter captions picture (wherein comprising the prompter information that the spokesman makes a speech required) can be diversified, for example conference terminal both can obtain prompter captions picture from the outside, also can oneself generate and obtain prompter captions picture, for instance, conference terminal can receive the prompter captions picture that comprises the prompter information that the spokesman makes a speech required of miscellaneous equipment input by video input interface, can be connected with a document camera as conference terminal, the prompter information that text video camera is made a speech the spokesman required is taken into prompter captions picture, and the prompter captions picture that will take passes to conference terminal; Perhaps, conference terminal also can receive the prompter captions picture that comprises the prompter information that the spokesman makes a speech required of PC or the Internet input, or, conference terminal can be after obtaining the prompter information that the spokesman makes a speech required, generates the prompter captions picture that comprises the prompter information that the spokesman makes a speech required.Conference terminal can be kept at the prompter captions picture that obtains in its storage medium (as internal memory).

202, video conference terminal carries out editing and processing at least one the prompter captions picture that obtains;

Under a kind of application scenarios, conference terminal carries out editing and processing to prompter captions picture can comprise prompter captions picture is carried out the form adjustment that multiple processing such as data conversion is to obtain waiting to send apparent prompter captions image data.

For example, conference terminal carries out editing and processing for being with α information (wherein with prompter captions picture, α=0 expression full impregnated is bright, α=1 expression all standing) prompter captions picture, so that be provided with according to the spokesman, revise the information such as color, background and background color of output prompter captions, specifically can be before needs sending stack to show, prompter captions picture is carried out conversion process such as transparent, background color.If carry out translucent change process, then the demonstration of prompter captions does not also have influence on the demonstration of far-end meeting video.

Further, prompter captions picture is carried out editing and processing can also can be comprised: prompter captions picture format size and the form that configuration shows are mated, for example if the form that prompter captions picture and configuration show does not match, then video conference terminal can carry out convergent-divergent to prompter captions picture, for example, controllable levels is consistent with display format during convergent-divergent, vertically scale can with horizontal scaling than identical, perhaps, vertical-horizontal is consistent with display format, horizontal scaling can with vertically scale than identical, in order to avoid captions anamorphose.

203, video conference terminal is according to the picture DISPLAY ORDER of appointment, will be through the prompter captions image cache of editing and processing in the video memory of this video conference terminal.

Under a kind of application scenarios, video conference terminal carries out editing and processing to prompter captions picture, and can and will be in this video conference terminal video memory through the prompter captions image cache of editing and processing according to the picture DISPLAY ORDER of appointment, so that the follow-up prompter captions image data that can directly read carries out the picture demonstration, realize skipping automatically function from the video memory of video conference terminal.Wherein, picture DISPLAY ORDER herein promptly is meant the sequencing between the prompter information, if prompter information is with behavior unit, then picture DISPLAY ORDER herein can refer to the sequencing between each prompter row, if prompter information is unit with the section, then picture DISPLAY ORDER herein can refer to the sequencing between each prompter section, if prompter information is unit with the page or leaf, then picture DISPLAY ORDER herein can refer to the sequencing (pressing for example mode shown in Fig. 3-a of page or leaf storage mode) between each prompter page or leaf, by that analogy.

204, video conference terminal receives the prompter instruction;

In the video conference process, if the spokesman need give prompter and make a speech, the spokesman can send the prompter instruction to conference terminal, and conference terminal starts the prompter function after receiving the prompter instruction.

Be appreciated that, above-mentioned for example with step 201～203 before step 204, be implemented as the example, certainly it also can be carried out after step 204, be that video conference terminal is after receiving the prompter instruction, obtain prompter captions picture again, it is carried out editing and processing, and will be through the prompter captions image cache of editing and processing in video memory according to the picture DISPLAY ORDER of appointment.

205, video conference terminal is provided with the stack window in the appointed area of the far-end meeting video that shows;

In actual applications, video conference terminal can be after receiving the prompter instruction, appointed area (for example upper left side, upper right side, lower left, lower right etc.) on the far-end meeting video of this locality demonstration is provided with prompter stack window, and the prompter caption content only shows in the prompter stack window that is provided with.

Video conference terminal can generate stack according to the prompter stack window that is provided with and enable sequential, concrete which row that corresponds to every frame far-end meeting video of specifying, and which row allows stack prompter captions picture; Can directly show delegation or multirow prompter captions at prompter stack window, and the mode by roll display, then continuable all prompter captions that demonstrate because the window that prompter can be superposeed is set enough for a short time, just can not have influence on normally watching of far-end meeting video yet.Wherein, because video conference terminal is by specified order, the prompter captions image data that editing and processing is obtained leaves in the video memory, conference terminal can be by operating the demonstration of the speech captions picture of realizing the appointed area to the initial address of stack video memory, add up or tired subtracting by stack video memory address, just can realize the roll display of prompter captions picture.

206, video conference terminal is in the appointed area that shows far-end meeting video, and according to the picture DISPLAY ORDER of appointment in the video memory, roll display is through the above-mentioned prompter captions picture of editing and processing.

In actual applications, under the manual mode, video conference terminal can also be supported external equipment control rolling operation, and as the rolling control command of basis from remote controller or the Internet, roll (as forward-reverse, move left and right) shows prompter captions picture.Under automatic mode, video conference terminal can be according to the rolling speed that is provided with, roll display prompter captions picture; Perhaps, dispose a word speed transducer, the speed and the progress that induce spokesman's speech (for example can be responded to the keyword corresponding audio in every section prompter captions, determine speed and the progress that the spokesman talks according to the audio frequency of sensing), video conference terminal is based on the rolling rate that is complementary with spokesman's word speed, roll display prompter captions picture.Certainly, conference terminal can instruct according to the spokesman, switches between manual mode and automatic mode, and for example video conference terminal can be under the control of spokesman by external scroll controller, carry out rolling speed, rolling control operations such as rotating direction, the time-out that rolls.

Wherein, at video conference terminal based on the rolling rate that is complementary with spokesman's word speed, in the mechanism of roll display prompter captions picture, video conference terminal can be sampled spokesman's audio frequency (in actual applications, video conference terminal for example can pass through sound pick up equipment, sampling spokesman audio frequency, its sample frequency can specifically be provided with as the case may be, for example sample frequency can be set at 4000 hertz), spokesman's audio frequency to this sampling carries out speech recognition, the spokesman's audio frequency word information relates that obtains and sample (for example, can be in database storage standards literal audio frequency (for example mandarin), also can store the literal audio frequency of various dialects, the literal audio frequency of storing in spokesman's audio frequency of this sampling and the database is mated, obtain spokesman's audio frequency word information relates of sampling, also can adopt other speech recognition technology certainly, obtain spokesman's audio frequency word information relates) with sampling; With that obtain and described spokesman's audio frequency word information relates sampling, the prompter caption information (can discern the prompter caption information that prompter captions picture can present by optical recognition) that can present with the prompter captions picture that passes through editing and processing (for example mates, video conference terminal can according to circumstances be set a matching degree threshold value (for example 85%, 90% or other value), when matching degree during greater than the matching degree threshold value of this setting, confirm that then both are complementary, video conference terminal will obtain with the sampling spokesman's audio frequency word information relates, the prompter caption information that can present with prompter captions picture mates, when that obtain and spokesman's audio frequency word information relates sampling, certain a part of matching degree of the prompter caption information that can present with prompter captions picture is when setting the matching degree threshold value, then determine current acquisition and spokesman's audio frequency word information relates sampling, this part prompter caption information that can present with prompter captions picture is complementary); According to matching result, in the prompter caption information that the prompter captions picture of demonstration process editing and processing can present, with the be complementary next part prompter caption information of part of spokesman's audio frequency word information relates of current sampling (can be next sentence, down several or next section etc.) corresponding picture position (yet can show simultaneously certainly in the prompter caption information that the prompter captions picture through editing and processing can present) with spokesman's audio frequency word information relates of current sampling part that is complementary.Concrete displayed scene can be shown in Fig. 3-b, can realize the make a speech synchronous prompter of literal of spokesman.

After the demonstration of finishing one page prompter captions picture, because prompter captions image data all has been saved in the video memory of video conference terminal, can enter down one page according to numbering automatically, open the automatic repagination purpose.For example illustrate shown in the 3-c, when the 1st page finish demonstration after, the stack picture first address jump to B by A, play the purpose of skipping.In order to improve reliability, if video memory has switched to B, the content that the corresponding loading in A address at this moment is the 3rd page.When the 2nd page finish after, switch to the A address again, by that analogy, just can show the prompter captions automatic page turning that will show comprehensively.

Further, video conference terminal is before stack shows, also can carry out the demonstration walk-through of prompter captions picture, owing to before the possibility blank is arranged in every page of picture, video conference terminal can be by being provided with the redirect of video memory address, and removing blank parts need not show, so, the prompter captions are presented at can be more smooth and easy when being connected between page or leaf and page or leaf, and prompter efficient is improved.

207, video conference terminal is cut into polylith with the prompter captions picture that current scrolling shows, and with the appointed area of its local meeting video that is added to, obtains local overlay video; This this locality overlay video encoded and send to the far-end video conference terminal of video conference.

In actual applications, the prompter captions picture that video conference terminal can show current scrolling be cut into polylith, for example cutting be size for N*M little soon, and (for example with the appointed area of its local meeting video that is added to, upper left lower-left, upper right bottom right etc.), obtain local overlay video; This this locality overlay video encoded and send to the far-end video conference terminal of video conference.Perhaps, speech literal that can be current according to the spokesman gets off that part of cutting of the prompter captions picture of spokesman's current speaking literal institute correspondence position, and with the appointed area of its local meeting video that is added to (for example, upper left lower-left, upper right bottom right etc.), obtain local overlay video; This this locality overlay video encoded and send to the far-end video conference terminal of video conference, concrete displayed scene can realize that the spokesman makes a speech presenting synchronously of literal shown in Fig. 3-d.

With for instance by the fixed size cutting, be the 1280*720 form as prompter captions picture format, the N value is less than 1280, and the M value is less than 720; Picture can be divided into for access is convenient and wait greatly, as N=640, M=360, the cutting of prompter captions picture is four image blocks, video conference terminal can will comprise the image block of prompter information take out, and, obtain local overlay video with the appointed area of its local meeting video that is added to; And with its video conference terminal transmission of back of encoding to the meeting far-end.The video conference far-end then can demonstrate spokesman's speech content when showing this speech video.Like this, can realize presenting in real time the function of the content of making a speech, also can save the work of backstage editor's captions to other participant.

Further, video conference terminal also can be real-time or instruct according to the spokesman, the cue mark (this cue mark can be icon, literal or other form) that shows prompter captions picture current scrolling progress displaying, so that the spokesman understands the progress of current speaking in real time, also remain the content etc. of how much making a speech.Video conference terminal also can show time limit of speech information, how long makes a speech so that the spokesman understands it in real time.

Therefore, can directly obtain the prompter captions picture that comprises the required prompter information of speech by video conference terminal in the present embodiment, and by named order editing and processing is obtained prompter captions image data and be cached in the video conference terminal video memory, after receiving the prompter instruction, appointed area at far-end meeting video, picture DISPLAY ORDER roll display prompter captions picture according to appointment in the video memory, owing to be to be that handle on the basis directly, handle complexity and can suitably reduce with the prompter captions picture that comprises the required prompter information of speech; Owing to introduced roll display mechanism, at the appointed area of far-end meeting video roll display prompter captions picture, help in clear demonstration prompter captions, not having influence on normally watching of far-end meeting video, and then promote meeting experience; And, owing to can utilize the intrinsic hardware resource of video conference terminal to realize the prompter function, can reduce the hardware implementation cost and the system complexity of prompter function in the video conference, improve the meeting-place and dispose flexibility.

Further, the video conference terminal support is based on the rolling rate controlling mechanism under manual or the automatic mode, and roll display prompter captions picture realizes that flexibility is higher.

For ease of the better in real time technical scheme of the above embodiment of the present invention, also provide the relevant apparatus that to implement technique scheme below the embodiment of the invention.

Embodiment three

A kind of video conference terminal 400 referring to Fig. 4, the embodiment of the invention provide can comprise: acquisition module 410, processing cache module 420, receiver module 430 and roll display module 440.

Wherein, acquisition module 410 is used to obtain at least one prompter captions picture;

In actual applications, the mode that the acquisition module 410 of video conference terminal obtains prompter captions picture (wherein comprising the prompter information that the spokesman makes a speech required) can be diversified, for example acquisition module 410 both can obtain prompter captions picture from the outside, also can ownly obtain prompter captions picture by generating picture, for instance, acquisition module 410 can receive the prompter captions picture that comprises the prompter information that the spokesman makes a speech required of miscellaneous equipment input by video input interface, can be connected with a document camera as acquisition module 410, the prompter information that text video camera is made a speech the spokesman required is taken into prompter captions picture, and the prompter captions picture that will take passes to video conference terminal 400; Perhaps, acquisition module 410 can also receive the prompter captions picture that comprises the prompter information that the spokesman makes a speech required of PC or the Internet input, or, acquisition module 410 can be after obtaining the prompter information that the spokesman makes a speech required, generates the prompter captions picture that comprises the prompter information that the spokesman makes a speech required.Acquisition module 410 can be kept at the prompter captions picture that obtains in its storage medium (as internal memory).

Be appreciated that, because (for example between each prompter row, each prompter section, each prompter page or leaf) has certain logic association and sequencing between all prompter information that the spokesman makes a speech required, therefore if acquisition module 410 gets access to a plurality of prompter captions pictures that comprise the prompter information that the spokesman makes a speech required, then also can specify a sequencing between these a plurality of prompter captions pictures, video conference terminal 400 can be numbered it sequentially, and deposit by page or leaf, so that show successively when showing.

Handle cache module 420, at least one the prompter captions picture that is used for acquisition module 410 is obtained carries out editing and processing, and will be through this prompter captions image cache of editing and processing in video conference terminal 400 video memorys according to the picture DISPLAY ORDER of appointment;

Under a kind of application scenarios, 420 pairs of prompter captions of processing cache module picture carries out editing and processing and can comprise: prompter captions picture is carried out editing and processing, and (α information can be indicated the transparency of picture for band α information, wherein, α=0 expression full impregnated is bright, α=1 expression all standing) prompter captions picture, so that indicate according to the spokesman, information such as the color of modification output prompter captions, background, background color, specifically can be before needs sending stack to show, prompter captions picture is carried out conversion process such as transparent, background color.Wherein, 420 pairs of prompter captions of processing cache module picture carries out editing and processing and can also can comprise: prompter captions picture format size and the form that configuration shows are mated, for example if the form that prompter captions picture and configuration show does not match, then can carry out convergent-divergent to prompter captions picture, for example, controllable levels is consistent with display format during convergent-divergent, vertically scale can with horizontal scaling than identical, in order to avoid captions anamorphose.

Handle 420 pairs of prompter captions of cache module picture and carry out editing and processing, and can will be in the video memory of video conference terminal through the prompter captions image cache of editing and processing according to the picture DISPLAY ORDER of appointment, so that the follow-up prompter captions picture that can directly read carries out the picture demonstration, realize skipping automatically function from the video memory of video conference terminal.Wherein, picture DISPLAY ORDER herein promptly is meant the sequencing between the prompter information, if prompter information is with behavior unit, then picture DISPLAY ORDER herein can refer to the sequencing between each prompter row, if prompter information is unit with the section, then this picture DISPLAY ORDER can refer to the sequencing between each prompter section, if prompter information is unit with the page or leaf, then this picture DISPLAY ORDER can refer to the sequencing between each prompter page or leaf, by that analogy.

Receiver module 430 is used to receive the prompter instruction;

Roll display module 440, be used for instructing according to the prompter that receiver module 430 receives, the appointed area of the far-end meeting video that shows at video conference terminal 400, according to the picture DISPLAY ORDER of appointment in the video memory, roll display is through the prompter captions picture of editing and processing.

In actual applications, roll display module 440 can be provided with prompter stack window in the appointed area on the far-end meeting video of this locality demonstration, and can enable sequential according to the prompter stack window generation stack that is provided with, concrete which row that corresponds to every frame far-end meeting video of specifying, which row allows stack prompter captions picture; Can directly show delegation or multirow prompter captions at prompter stack window, and the mode by roll display, then continuable all prompter captions that demonstrate because the window that prompter can be superposeed is set enough for a short time, just can not have influence on normally watching of far-end meeting video yet.Wherein, because handling cache module 420 is by specified order, to leave in the video conference terminal video memory through the prompter captions picture of editing and processing, roll display module 440 can be by operating the demonstration of the speech captions picture of realizing the appointed area to the initial address of stack video memory, add up or tired subtracting by stack video memory address, just can realize the roll display of prompter captions picture.

Under a kind of application scenarios, roll display module 440 can be carried out rolling rate control to roll display prompter captions picture, for example based on manual mode or automatic mode, roll display module 440 can be based on predetermined rolling rate, and roll display is through the prompter captions picture of editing and processing; Perhaps, roll display module 440 can be based on the rolling rate that is complementary with spokesman's word speed, and roll display is through the prompter captions picture of editing and processing; Perhaps, roll display module 440 can be according to spokesman's roll display control command, and roll display is through the prompter captions picture of editing and processing, and certainly, roll display module 440 can be based on other mechanism, roll display prompter captions picture.

Under a kind of application scenarios, handling cache module 420 can comprise: editing and processing submodule and cache sub-module (not shown among Fig. 4).

The editing and processing submodule, at least one the prompter captions picture that is used for acquisition module 410 is obtained carries out editing and processing, obtains the prompter captions picture with α information;

Cache sub-module is used for the picture DISPLAY ORDER according to appointment, and the prompter captions image cache of the band α information that above-mentioned editing and processing submodule is obtained is in video memory.

Under a kind of application scenarios, roll display module 440 can comprise: one or more (not shown among Fig. 4) in the first roll display submodule, the second roll display submodule and the 3rd roll display submodule.

Wherein, the first roll display submodule, be used for instructing according to the prompter that receiver module 430 receives, the appointed area of the far-end meeting video that shows at video conference terminal 400, according to the picture DISPLAY ORDER of appointment in the video memory, based on the above-mentioned prompter captions picture of predetermined rolling rate roll display through editing and processing;

The second roll display submodule, be used for instructing according to the prompter that receiver module 430 receives, the appointed area of the far-end meeting video that shows at video conference terminal 400, picture DISPLAY ORDER according to appointment in the video memory, based on the rolling rate that is complementary with spokesman's word speed, roll display is through the above-mentioned prompter captions picture of editing and processing;

The 3rd roll display submodule, be used for instructing according to the prompter that receiver module 430 receives, the appointed area of the far-end meeting video that shows at video conference terminal 400, picture DISPLAY ORDER according to appointment in the video memory, according to spokesman's roll display control command, roll display is through the above-mentioned prompter captions picture of editing and processing.

Under a kind of application scenarios, the second roll display submodule can comprise: sampling submodule, speech recognition submodule, matched sub-block and demonstration control submodule (not shown among Fig. 4).

Wherein, the sampling submodule, spokesman's audio frequency is used to sample;

In actual applications, the sampling submodule for example can pass through sound pick up equipment, sampling spokesman audio frequency, and its sample frequency can specifically be provided with as the case may be, and for example sample frequency can be set at 4000 hertz

The speech recognition submodule is used for spokesman's audio frequency of sampling submodule sampling is carried out speech recognition, obtains the described spokesman's audio frequency word information relates with sampling;

Matched sub-block, that be used for the speech recognition submodule is obtained and spokesman's audio frequency word information relates sampling, the prompter caption information that can present with above-mentioned prompter captions picture through editing and processing mates;

For example, matched sub-block can according to circumstances be set a matching degree threshold value (for example 85%, 90% or other value), when matching degree during greater than the matching degree threshold value of this setting, confirm that then both are complementary, matched sub-block will obtain with the sampling spokesman's audio frequency word information relates, the prompter caption information that can present with prompter captions picture mates, when that obtain and spokesman's audio frequency word information relates sampling, certain a part of matching degree of the prompter caption information that can present with prompter captions picture is when setting the matching degree threshold value, then determine current acquisition and spokesman's audio frequency word information relates sampling, this part prompter caption information that can present with prompter captions picture is complementary.

Show the control submodule, be used for matching result according to matched sub-block, in the prompter caption information that the above-mentioned prompter captions picture of demonstration process editing and processing can present, with the be complementary corresponding picture position of next part prompter caption information (can be next sentence, following several or next section etc.) of part of spokesman's audio frequency word information relates of current sampling.

Certainly, show that the control submodule also can be used for, show simultaneously in the prompter caption information that the prompter captions picture through editing and processing can present, with spokesman's audio frequency word information relates of current sampling part that is complementary.

Under a kind of application scenarios, video conference terminal 400 also can comprise: local video laminating module and sending module (not shown among Fig. 4).

Wherein, the local video laminating module is used for the above-mentioned prompter captions picture that roll display module current scrolling shows is cut into polylith, and with the appointed area of its local meeting video that is added to, obtains local overlay video;

Sending module is used for the local overlay video that above-mentioned local video laminating module obtains being encoded and sending to the meeting far-end.

Under a kind of application scenarios, the local video laminating module of video conference terminal 400 can be cut into polylith with the prompter captions picture that current scrolling shows and (can cut according to specific size, perhaps, speech literal that can be current according to the spokesman, that part of cutting of the prompter captions picture of spokesman's current speaking literal institute correspondence position is got off), and, obtain local overlay video with the appointed area of its local meeting video that is added to; Should this locality overlay video encode and send to far-end video conference terminal (the far-end video conference terminal refers to the one or more video conference terminals of other except that video conference terminal in the active conference), this this locality overlay video can directly send to the far-end video conference terminal, perhaps can be undertaken being transmitted to the far-end video conference terminal after the respective handling by intermediate equipment, far-end video conference far-end then can demonstrate spokesman's speech content when showing the speech video.Like this, can realize presenting in real time the function of the content of making a speech, also can save the work of backstage editor's captions to other participant.

Under a kind of application scenarios, video conference terminal 400 also can comprise: progress prompt module (not shown among Fig. 4).

Wherein, the progress prompt module is used to show the cue mark of above-mentioned prompter captions picture current scrolling progress displaying.

Need to prove, the video conference terminal 400 of present embodiment can be as the video conference terminal among the above-mentioned method embodiment, can be used for realizing whole technical schemes of said method embodiment, the function of its each functional module can be according to the method specific implementation among the said method embodiment, its specific implementation process can repeat no more with reference to the associated description in the foregoing description herein.

Therefore, can directly obtain the prompter captions picture that comprises the required prompter information of speech by video conference terminal 400 in the present embodiment, and by named order editing and processing is obtained prompter captions image data and be cached in video conference terminal 400 video memorys, after receiving the prompter instruction, appointed area at far-end meeting video, picture DISPLAY ORDER roll display prompter captions picture according to appointment in the video memory, owing to be to be that handle on the basis directly, handle complexity and can suitably reduce with the prompter captions picture that comprises the required prompter information of speech; Owing to introduced roll display mechanism, at the appointed area of far-end meeting video roll display prompter captions picture, help in clear demonstration prompter captions, not having influence on normally watching of far-end meeting video, and then promote meeting experience; And, owing to can utilize video conference terminal 400 intrinsic hardware resources to realize the prompter function, can reduce the hardware implementation cost and the system complexity of prompter function in the video conference, improve the meeting-place and dispose flexibility.

Embodiment four

The present invention realizes another embodiment of the method for prompter in the video conference, can comprise: video conference terminal obtains the prompter subtitle file; Reception prompter instruction; Sampling spokesman audio frequency; Spokesman's audio frequency to this sampling carries out speech recognition, obtains this spokesman's audio frequency word information relates with sampling; With that obtain and this spokesman's audio frequency word information relates sampling, the prompter caption information that comprises with the prompter subtitle file mates; According to matching result, the appointed area of the far-end meeting video that shows at video conference terminal, show in the prompter caption information that above-mentioned prompter subtitle file comprises, with the be complementary next part prompter caption information of part of this spokesman's audio frequency word information relates of current sampling.

Referring to Fig. 5, concrete steps can comprise:

510, video conference terminal obtains the prompter subtitle file;

In actual applications, video conference terminal obtains the prompter subtitle file and (comprises the prompter information that the spokesman makes a speech required in the prompter subtitle file, its form can be a picture, document or other form) mode can be diversified, for example conference terminal both can obtain the prompter subtitle file from the outside, also can oneself generate and obtain the prompter subtitle file, for instance, video conference terminal can receive the prompter subtitle file that comprises the prompter information that the spokesman makes a speech required of miscellaneous equipment input by video input interface, for example conference terminal can be connected with a document camera, the prompter information that text video camera is made a speech the spokesman required is taken into prompter captions picture, and the prompter captions picture that will take passes to video conference terminal; Perhaps, video conference terminal also can receive the prompter captions picture that comprises the prompter information that the spokesman makes a speech required of PC or the Internet input, or, conference terminal can be after obtaining the prompter information that the spokesman makes a speech required, generates the prompter captions picture that comprises the prompter information that the spokesman makes a speech required.Conference terminal can be kept at the prompter captions picture that obtains in its storage medium (as internal memory).

Be appreciated that, because (for example between each prompter row, each prompter section, each prompter page or leaf) has certain logic association and sequencing between all prompter information that the spokesman makes a speech required, therefore, if got access to a plurality of prompter captions pictures that comprise the prompter information that the spokesman makes a speech required, then also can specify a sequencing between these a plurality of prompter captions pictures, video conference terminal can be numbered it sequentially, and deposits by page or leaf, so that show successively when showing.

Further, if the prompter subtitle file that gets access to is prompter captions documents, video conference terminal can not carry out editing and processing to it; The prompter subtitle file that gets access to is prompter captions pictures, and video conference terminal also can carry out editing and processing to it, and the mode of prompter captions picture being carried out editing and processing can repeat no more with reference to the description in the foregoing description one to three herein.

520, video conference terminal receives the prompter instruction;

Be appreciated that above-mentionedly to be implemented as example with step 410 before step 420 for example, it also can be carried out step 420 after certainly, and promptly video conference terminal obtains the prompter subtitle file again receiving after prompter instructs.

530, video conference terminal sampling spokesman audio frequency;

In actual applications, video conference terminal can pass through sound pick up equipment, sampling spokesman audio frequency, and sample frequency can specifically be provided with as the case may be, and for example sample frequency can be set at 4000 hertz.

540, video conference terminal carries out speech recognition to spokesman's audio frequency of sampling, obtains the spokesman's audio frequency word information relates with sampling;

In actual applications, can be in database storage standards literal audio frequency (for example mandarin), also can store the literal audio frequency of various dialects, video conference terminal can mate the literal audio frequency of storing in spokesman's audio frequency of this sampling and the database, obtain spokesman's audio frequency word information relates of sampling, certainly also can adopt other speech recognition technology, obtain spokesman's audio frequency word information relates with sampling.

550, video conference terminal will obtain with the sampling spokesman's audio frequency word information relates, the prompter caption information that comprises with the prompter subtitle file mates;

In actual applications, can according to circumstances set a matching degree threshold value (for example 85%, 90% or other value), when matching degree during greater than the matching degree threshold value of this setting, confirm that then both are complementary, video conference terminal will obtain with the sampling spokesman's audio frequency word information relates, the prompter caption information that comprises with the prompter subtitle file mates, when that obtain and spokesman's audio frequency word information relates sampling, certain a part of matching degree of the prompter caption information that comprises with the prompter subtitle file is when setting the matching degree threshold value, that determine to obtain and spokesman's audio frequency word information relates sampling, this part of the prompter caption information that comprises with the prompter subtitle file is complementary.

560, video conference terminal is according to matching result, the appointed area of the far-end meeting video that shows at video conference terminal, in the prompter caption information that demonstration prompter subtitle file comprises, with the be complementary next part prompter caption information (can be next sentence, following several or next section etc.) of part of spokesman's audio frequency word information relates of current sampling.

Certainly, video conference terminal yet can show simultaneously in the prompter caption information that the prompter subtitle file comprises, with spokesman's audio frequency word information relates of current sampling part that is complementary.Concrete displayed scene can be shown in Fig. 3-b, can realize the make a speech synchronous prompter of literal of spokesman.

Under a kind of application scenarios, in the prompter caption information that video conference terminal also can comprise the prompter subtitle file, with spokesman's audio frequency word information relates of current sampling corresponding prompter caption information of part that is complementary, the be added to appointed area of local meeting video obtains local overlay video; This this locality overlay video encoded and send to the far-end video conference terminal of video conference.

In actual applications, if the prompter subtitle file is prompter captions pictures, video conference terminal can be cut into polylith with the prompter captions picture that current scrolling shows and (can cut according to specific size, or, speech literal that can be current according to the spokesman, that part of cutting of the prompter captions picture of spokesman's current speaking literal institute correspondence position is got off), and, obtain local overlay video with the appointed area of its local meeting video that is added to; Should this locality overlay video encode and send to far-end video conference terminal (the far-end video conference terminal refers to the one or more video conference terminals of other except that video conference terminal in the active conference), this this locality overlay video can directly send to the far-end video conference terminal, perhaps can be undertaken being transmitted to the far-end video conference terminal after the respective handling by intermediate equipment, far-end video conference far-end then can demonstrate spokesman's speech content when showing the speech video.Like this, can realize presenting in real time the function of the content of making a speech, also can save the work of backstage editor's captions to other participant.

Further, video conference terminal also can be real-time or instruct according to the spokesman, the cue mark (this cue mark for example can be icon, literal or other form) that shows prompter subtitle file current scrolling progress displaying, so that the spokesman understands the progress of current speaking in real time, also remain the content etc. of how much making a speech.Video conference terminal also can show time limit of speech information, how long makes a speech so that the spokesman understands it in real time.

Therefore, directly obtain the prompter subtitle file that comprises the required prompter information of speech by video conference terminal in the present embodiment, after receiving the prompter instruction, sampling spokesman audio frequency; Spokesman's audio frequency to sampling carries out speech recognition, obtains the spokesman's audio frequency word information relates with sampling; With that obtain and this spokesman's audio frequency word information relates sampling, the prompter caption information that comprises with the prompter subtitle file mates; According to matching result, the appointed area of the far-end meeting video that shows at video conference terminal, show in the prompter caption information that this prompter subtitle file comprises, with the be complementary next part prompter caption information of part of spokesman's audio frequency word information relates of current sampling.Owing to introduce the audio identification technology and according to the real-time roll display mechanism of spokesman's voice, at the appointed area of far-end meeting video roll display prompter captions, can realize real-time prompter automatically, and help in clear demonstration prompter captions, do not have influence on normally watching of far-end meeting video, bigger lifting meeting is experienced; And, owing to can utilize the intrinsic hardware resource of conference terminal to realize the prompter function, can reduce the hardware implementation cost and the system complexity of prompter function in the video conference, improve the meeting-place and dispose flexibility.

Embodiment five

A kind of video conference terminal 600 referring to Fig. 6, the embodiment of the invention five provide can comprise: second acquisition module 610, receiver module 620, sampling module 630, sound identification module 640, matching module 650 and display control module 660.

Wherein, second acquisition module 610 is used to obtain the prompter subtitle file;

In actual applications, the form that second acquisition module 610 obtains comprises the prompter subtitle file of the prompter information that the spokesman makes a speech required can be a picture, document or other form.

Under a kind of application scenarios, if the prompter subtitle file that second acquisition module 610 gets access to is prompter captions documents, video conference terminal 600 can not carry out editing and processing to it; If the prompter subtitle file that second acquisition module 610 gets access to is prompter captions pictures, then video conference terminal 600 also can comprise: the editing and processing module, be used for prompter captions picture is carried out editing and processing, the mode of prompter captions picture being carried out editing and processing can repeat no more with reference to the description in the foregoing description one to three herein.

Receiver module 620 is used to receive the prompter instruction;

Sampling module 630, spokesman's audio frequency is used to sample;

In actual applications, sampling module 630 can pass through sound pick up equipment, sampling spokesman audio frequency, and sample frequency can specifically be provided with as the case may be, and for example sample frequency can be set at 4000 hertz.Certainly, sampling module 630 also can pass through other existing audio sample mechanism, sampling spokesman audio frequency.

Sound identification module 640 is used for spokesman's audio frequency of sampling module 630 samplings is carried out speech recognition, obtains the spokesman's audio frequency word information relates with sampling;

In actual applications, for example can be in database storage standards literal audio frequency (for example mandarin), also can store the literal audio frequency of various dialects, sound identification module 640 can mate the literal audio frequency of storing in spokesman's audio frequency of this sampling and the database, obtain spokesman's audio frequency word information relates of sampling, certainly sound identification module 640 also can adopt other speech recognition technology, obtains the spokesman's audio frequency word information relates with sampling.Certainly, sound identification module 640 also can pass through other existing one or more sound identification modules, and spokesman's audio frequency that sampling module 630 is sampled carries out speech recognition, obtains the spokesman's audio frequency word information relates with sampling.

Matching module 650, that be used for sound identification module 640 is obtained and this spokesman's audio frequency word information relates sampling, the prompter caption information that comprises with the prompter subtitle file mates;

In actual applications, can according to circumstances set a matching degree threshold value (for example 85%, 90% or other value), when matching degree during greater than the matching degree threshold value of this setting, confirm that then both are complementary, matching module 650 that sound identification module 640 is obtained with spokesman's audio frequency word information relates sampling, the prompter caption information that comprises with the prompter subtitle file mates, that obtain when sound identification module 640 and spokesman's audio frequency word information relates sampling, certain a part of matching degree of the prompter caption information that comprises with the prompter subtitle file is when setting the matching degree threshold value, determine sound identification module 640 that obtain with spokesman's audio frequency word information relates sampling, this part of the prompter caption information that comprises with the prompter subtitle file is complementary.

Display control module 660, be used for matching result according to matching module, the appointed area of the far-end meeting video that shows at video conference terminal 600, show in the prompter caption information that above-mentioned prompter subtitle file comprises, with the be complementary next part prompter caption information (can be next sentence, following several or next section etc.) of part of spokesman's audio frequency word information relates of current sampling.

Certainly, display control module 660 yet can show in the prompter caption information that the prompter subtitle file comprises simultaneously, with spokesman's audio frequency word information relates of current sampling part that is complementary.Concrete displayed scene can be shown in Fig. 3-b, can realize the make a speech synchronous prompter of literal of spokesman.

Under a kind of application scenarios, video conference terminal 600 also can comprise: video superimpose module (not shown among Fig. 6).

The video superimpose module, be used for prompter caption information that the prompter subtitle file is comprised, with spokesman's audio frequency word information relates of current sampling corresponding prompter caption information of part that is complementary, the appointed area of the local meeting video that is added to obtains local overlay video; This this locality overlay video encoded and send to the far-end video conference terminal of video conference.

Further, video conference terminal 600 also can comprise: progress prompt module (not shown among Fig. 6).

Wherein, the progress prompt module, be used for real-time or instruct according to the spokesman, the cue mark (this cue mark for example can be icon, literal or other form) that shows prompter subtitle file current scrolling progress displaying, so that the spokesman understands the progress of current speaking in real time, also remain the content etc. of how much making a speech.

Further, the progress prompt module also can show time limit of speech information, how long makes a speech so that the spokesman understands it in real time.

Need to prove, the video conference terminal 600 of present embodiment can be as the video conference terminal among the above-mentioned method embodiment four, can be used for realizing whole technical schemes of said method embodiment, the function of its each functional module can be according to the method specific implementation among the said method embodiment, its specific implementation process can repeat no more with reference to the associated description in the foregoing description herein.

Therefore, directly obtain the prompter subtitle file that comprises the required prompter information of speech by video conference terminal 600 in the present embodiment, after receiving the prompter instruction, sampling spokesman audio frequency; Spokesman's audio frequency to sampling carries out speech recognition, obtains the spokesman's audio frequency word information relates with sampling; With that obtain and this spokesman's audio frequency word information relates sampling, the prompter caption information that comprises with the prompter subtitle file mates; According to matching result, the appointed area of the far-end meeting video that shows at video conference terminal 600, show in the prompter caption information that this prompter subtitle file comprises, with the be complementary next part prompter caption information of part of this spokesman's audio frequency word information relates of current sampling.Owing to introduce the audio identification technology and according to the real-time roll display mechanism of spokesman's voice, at the appointed area of far-end meeting video roll display prompter captions, can realize real-time prompter automatically, and help in clear demonstration prompter captions, do not have influence on normally watching of far-end meeting video, bigger lifting meeting is experienced; And, owing to can utilize the intrinsic hardware resource of conference terminal to realize the prompter function, can reduce the hardware implementation cost and the system complexity of prompter function in the video conference, improve the meeting-place and dispose flexibility.

Need to prove, for aforesaid each method embodiment, for simple description, so it all is expressed as a series of combination of actions, but those skilled in the art should know, the present invention is not subjected to the restriction of described sequence of movement, because according to the present invention, some step can adopt other orders or carry out simultaneously.Secondly, those skilled in the art also should know, the embodiment described in the specification all belongs to preferred embodiment, and related action and module might not be that the present invention is necessary.

In the above-described embodiments, the description of each embodiment is all emphasized particularly on different fields, do not have the part that describes in detail among certain embodiment, can be referring to the associated description of other embodiment.

To sum up, in a kind of technical scheme that the embodiment of the invention provides, directly obtain the prompter captions picture that comprises the required prompter information of speech by video conference terminal, and by named order editing and processing is obtained prompter captions image data and be cached in the video conference terminal video memory, after receiving the prompter instruction, appointed area at far-end meeting video, picture DISPLAY ORDER roll display prompter captions picture according to appointment in the video memory, owing to be to be that handle on the basis directly, handle complexity and can suitably reduce with the prompter captions picture that comprises the required prompter information of speech; Owing to introduced roll display mechanism, at the appointed area of far-end meeting video roll display prompter captions picture, help in clear demonstration prompter captions, not having influence on normally watching of far-end meeting video, and then promote meeting experience; And, owing to can utilize the intrinsic hardware resource of video conference terminal to realize the prompter function, can reduce the hardware implementation cost and the system complexity of prompter function in the video conference, improve the meeting-place and dispose flexibility.

In the another kind of technical scheme that the embodiment of the invention provides, directly obtain the prompter subtitle file that comprises the required prompter information of speech by video conference terminal, after receiving the prompter instruction, sampling spokesman audio frequency; Spokesman's audio frequency to sampling carries out speech recognition, obtains the spokesman's audio frequency word information relates with sampling; With that obtain and this spokesman's audio frequency word information relates sampling, the prompter caption information that comprises with the prompter subtitle file mates; According to matching result, the appointed area of the far-end meeting video that shows at video conference terminal, show in the prompter caption information that this prompter subtitle file comprises, with the be complementary next part prompter caption information of part of spokesman's audio frequency word information relates of current sampling.Owing to introduce the audio identification technology and according to the real-time roll display mechanism of spokesman's voice, at the appointed area of far-end meeting video roll display prompter captions, can realize real-time prompter automatically, and help in clear demonstration prompter captions, do not have influence on normally watching of far-end meeting video, bigger lifting meeting is experienced; And, owing to can utilize the intrinsic hardware resource of conference terminal to realize the prompter function, can reduce the hardware implementation cost and the system complexity of prompter function in the video conference, improve the meeting-place and dispose flexibility.

Further, the conference terminal support is based on the rolling rate controlling mechanism under manual or the automatic mode, and roll display prompter captions picture realizes that flexibility is higher.

One of ordinary skill in the art will appreciate that all or part of step in the whole bag of tricks of the foregoing description is to instruct relevant hardware to finish by program, this program can be stored in the computer-readable recording medium, and storage medium can comprise: read-only memory, random asccess memory, disk or CD etc.

More than the method and the device of prompter in the realization video conference that the embodiment of the invention provided is described in detail, used specific case herein principle of the present invention and execution mode are set forth, the explanation of above embodiment just is used for helping to understand method of the present invention and core concept thereof; Simultaneously, for one of ordinary skill in the art, according to thought of the present invention, the part that all can change in specific embodiments and applications, to sum up, this description should not be construed as limitation of the present invention.

Claims

1. a method that realizes prompter in the video conference is characterized in that, comprising:

Video conference terminal obtains at least one prompter captions picture;

Reception prompter instruction;

2. method according to claim 1 is characterized in that, described roll display comprises through the described prompter captions picture of editing and processing:

Based on predetermined rolling rate, roll display is through the described prompter captions picture of editing and processing;

Perhaps, based on the rolling rate that is complementary with spokesman's word speed, roll display is through the described prompter captions picture of editing and processing;

Perhaps, according to spokesman's roll display control command, roll display is through the described prompter captions picture of editing and processing.

3. method according to claim 1 is characterized in that,

Described based on the rolling rate that is complementary with spokesman's word speed, roll display comprises through the described prompter captions picture of editing and processing:

Sampling spokesman audio frequency;

With that obtain and described spokesman's audio frequency word information relates sampling, the prompter caption information that can present with described prompter captions picture through editing and processing mates;

According to matching result, show in the prompter caption information that the described prompter captions picture through editing and processing can present, with the be complementary corresponding picture position of next part prompter caption information partly of described spokesman's audio frequency word information relates of current sampling.

4. according to each described method of claim 1 to 3, it is characterized in that described method also comprises:

The described prompter captions picture of the process editing and processing that current scrolling is shown is cut into polylith, and with the appointed area of its local meeting video that is added to, obtains local overlay video;

Described local overlay video is encoded and send to the far-end video conference terminal.

5. according to each described method of claim 1 to 3, it is characterized in that, described described at least one prompter captions picture carried out editing and processing, comprising:

Described at least one prompter captions picture editor is processed into the prompter captions picture of band α information.

6. a method that realizes prompter in the video conference is characterized in that, comprising:

Video conference terminal obtains the prompter subtitle file;

Reception prompter instruction;

Sampling spokesman audio frequency;

7. a video conference terminal is characterized in that, comprising:

Acquisition module is used to obtain at least one prompter captions picture;

Receiver module is used to receive the prompter instruction;

8. video conference terminal according to claim 7 is characterized in that,

Described roll display module comprises:

The first roll display submodule, the prompter that is used for receiving according to described receiver module is instructed, the appointed area of the far-end meeting video that shows at described video conference terminal, according to the picture DISPLAY ORDER of appointment in the video memory, based on the described prompter captions picture of predetermined rolling rate roll display through editing and processing;

Perhaps,

The second roll display submodule, the prompter that is used for receiving according to described receiver module is instructed, the appointed area of the far-end meeting video that shows at described video conference terminal, picture DISPLAY ORDER according to appointment in the video memory, based on the rolling rate that is complementary with spokesman's word speed, roll display is through the described prompter captions picture of editing and processing;

Perhaps,

The 3rd roll display submodule, the prompter that is used for receiving according to described receiver module is instructed, the appointed area of the far-end meeting video that shows at described video conference terminal, picture DISPLAY ORDER according to appointment in the video memory, according to spokesman's roll display control command, roll display is through the described prompter captions picture of editing and processing.

9. video conference terminal according to claim 8 is characterized in that,

The second roll display submodule comprises:

The sampling submodule, spokesman's audio frequency is used to sample;

The speech recognition submodule is used for spokesman's audio frequency of described sampling submodule sampling is carried out speech recognition, obtains the described spokesman's audio frequency word information relates with sampling;

Matched sub-block, that be used for described speech recognition submodule is obtained and described spokesman's audio frequency word information relates sampling, the prompter caption information that can present with described prompter captions picture through editing and processing mates;

Show the control submodule, be used for matching result according to described matched sub-block, show in the prompter caption information that the described prompter captions picture through editing and processing can present, with the be complementary corresponding picture position of next part prompter caption information partly of described spokesman's audio frequency word information relates of current sampling.

10. according to each described video conference terminal of claim 7 to 9, it is characterized in that,

Described video conference terminal also comprises:

The local video laminating module, the described prompter captions picture that is used for process editing and processing that roll display module current scrolling is shown is cut into polylith, and with the appointed area of its local meeting video that is added to, obtains local overlay video;

Sending module is used for the local overlay video that described local video laminating module obtains being encoded and sending to the far-end video conference terminal.

11. a video conference terminal is characterized in that, comprising:

Second acquisition module is used to obtain the prompter subtitle file;

Receiver module is used to receive the prompter instruction;

Sampling module, spokesman's audio frequency is used to sample;