CN103595953A

CN103595953A - Method and device for controlling video shooting

Info

Publication number: CN103595953A
Application number: CN201310566974.1A
Authority: CN
Inventors: 王静; 刘智辉; 张金亮
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2013-11-14
Filing date: 2013-11-14
Publication date: 2014-02-19
Anticipated expiration: 2033-11-14
Also published as: WO2015070558A1; CN103595953B

Abstract

The invention provides a method and device for controlling video shooting, and relates to the field of video images, wherein facial images of a speaker can be reserved, the number of times of video switch can be reduced, transition between the images is close, and an output video is smoother. The method comprises the following steps: when a first speaker talks, a first video camera device is controlled to shoot a video of the first speaker; when a second speaker replaces the first speaker and serves as the current speaker, a second video camera device is controlled to shoot a video of the second speaker, wherein the second speaker is a next speaker who stands at a position different from the first speaker; when speakers change later, the first video camera device and the second video camera device are controlled to alternately shoot a video of the current speaker; the video of the current speaker is output after being successfully received. The method and device for controlling video shooting are applied to video conferences.

Description

A kind of method and apparatus of controlling video capture

Technical field

The present invention relates to video image field, relate in particular to a kind of method and apparatus of controlling video capture.

Background technology

Generally, in video conference video camera with the size of fixing, fixing all participants' of angle shot panorama.When meeting-place is larger, video camera may from teller away from, take picture out and cannot determine that who,, in speech, cannot see teller's facial expression clearly, cause thus the loss of the valuable information of meeting.

For fear of cause the loss of the valuable information of meeting because of a photographing panoramic picture, can use two video cameras to take meeting-place picture simultaneously.Wherein a video camera is all the time for taking the panorama in meeting-place, and another video camera is for track up teller's picture.

While having people alternately to talk in meeting-place, because the video camera of track up talker picture needed rotation/push-and-pull camera before the picture that successfully obtains current talker, the video photographing in this process is unstable, watches uncomfortablely, and picture need to first be switched to the panorama in meeting-place in the meantime.But this switching can cause the linking of picture not tight, the video that is sent to remote site is not smooth, can give the very uncomfortable sensation of beholder.

Summary of the invention

Embodiments of the invention provide a kind of method and apparatus of controlling video capture, can when retaining talker's facial picture, reduce video switch number of times, make the linking of picture tight, and the video of output is more smooth.

First aspect, provides a kind of method of controlling video capture, comprising:

When the first talker talks, control the video that the first camera head is taken the first talker;

When current talker changes to the second talker from described the first talker, control the video that the second camera head is taken the second talker, wherein, described the second talker is the next talker different from described the first talker position;

When follow-up, when talker occurring again changing, control successively described the first camera head and described the second current talker's of camera head reverse shot video;

After successfully obtaining described current talker's video, export described current talker's video.

In conjunction with first aspect, in the possible implementation of the first, the described current talker's of described output video comprises: the described current talker's of full frame output video;

In conjunction with the possible implementation of the first of first aspect, in the possible implementation of the second of first aspect, the described current talker's of described full frame output video comprises:

Before successfully obtaining described current talker's video, the described current talker's of full frame output previous talker's video;

After successfully obtaining described current talker's video, the described current talker's of full frame output video.

In conjunction with first aspect, in the third possible implementation of first aspect, the described current talker's of described output video comprises: the video of simultaneously exporting described current talker and described current talker's previous talker with the form of picture-in-picture;

Wherein, described picture-in-picture comprises the first picture and is included in the second little picture of the first picture described in the ratio in described the first picture, in described the first picture, export described current talker, in described the second picture, export described current talker's previous talker.

In conjunction with the third possible implementation of first aspect, in the 4th kind of possible implementation of first aspect, described method also comprises:

When current talker changes to the 3rd talker from described the second talker, control the video that described the first camera head is taken the 3rd talker, wherein, described the 3rd talker is the next talker different from described the second talker position;

The video that the described form with picture-in-picture is exported described current talker and described current talker's previous talker simultaneously comprises:

Before successfully obtaining described the 3rd talker's video: export described the second talker in described the first picture, export described the first talker's freeze frame in described the second picture; Or, in described the first picture, export described the second talker, in described the second picture, output has started to take but described the 3rd talker in not yet successful acquisition process;

After successfully obtaining described the 3rd talker's video: export described the 3rd talker in described the first picture, export described the second talker in described the second picture.

In conjunction with first aspect, in the 5th kind of possible implementation of first aspect, the described current talker's of described output video comprises: the video of simultaneously exporting described current talker and described current talker's previous talker with the form of two pictures;

Wherein, described output picture comprises the two part pictures that do not comprise mutually, the described current talker of a part of picture output, the described current talker's of another part picture output previous talker.

In conjunction with the 5th kind of possible implementation of first aspect, in the 6th kind of possible implementation of first aspect, described method also comprises:

The video that the described form with two pictures is exported described current talker and described current talker's previous talker simultaneously comprises:

Before successfully obtaining described the 3rd talker's video: export described the first talker's freeze frame in described a part of picture, export described the second talker in described another part picture; Or in described a part of picture, output has started to take but described the 3rd talker in not yet successful acquisition process, exports described the second talker in described another part picture;

After successfully obtaining described the 3rd talker's video: export described the 3rd talker in described a part of picture, export described the second talker in described another part picture.

In conjunction with first aspect, in the 7th kind of possible implementation of first aspect, before described control the first camera head is taken the first talker's video, described method also comprises:

When initial condition, control described the first camera head and described the second camera head and take the video in whole meeting-place and captured video is exported.

In conjunction with seven kinds of possible arbitrary implementations of the first to the of first aspect or first aspect, in the 8th kind of possible implementation of first aspect, before described control the first camera head is taken the first talker's video, described method also comprises:

For described the first camera head and described the second camera head arrange respectively tracking mark, wherein, the tracking mark of described the first camera head is initially the first tracking mark, and the tracking mark of described the second camera head is initially the second tracking mark;

It is described when the first talker talks, controlling the first camera head takes the first talker's video and comprises: when the first talker talks, the first camera head that control has the first tracking mark removes to take the first talker's video, after successfully obtaining described the first talker's video, the tracking mark of described the first camera head is set to described the second tracking mark from described the first tracking mark, and the tracking mark of described the second camera head is set to described the first tracking mark from described the second tracking mark simultaneously;

It is described when current talker changes to the second talker from described the first talker, controlling the second camera head takes the second talker's video and comprises: when current talker changes to the second talker from described the first talker, the second camera head that control has described the first tracking mark removes to take the second talker's video, after successfully obtaining described the second talker's video, the tracking mark of described the second camera head is set to described the second tracking mark from described the first tracking mark, the tracking mark of described the first camera head is set to described the first tracking mark from described the second tracking mark simultaneously.

The 8th kind of possible implementation in conjunction with first aspect, in the 9th kind of possible implementation of first aspect, described when follow-up when talker occurring again changing, the video of controlling successively described the first camera head and described the second current talker of camera head reverse shot comprises: when follow-up each generation talker changes, the camera head that control has described the first tracking mark removes to take current talker's video, after successfully obtaining current talker's video, the tracking mark of described the first camera head and described the second camera head is exchanged.

In conjunction with the 9th kind of possible implementation of first aspect, in the tenth kind of possible implementation of first aspect, the video of controlling camera head shooting talker comprises:

Utilize auditory localization technology, control the video that camera head is taken talker.

In conjunction with the tenth kind of possible implementation of first aspect, in the 11 kind of possible implementation of first aspect, the described auditory localization technology of utilizing, the video of controlling camera head shooting talker comprises:

Utilize auditory localization technology and in conjunction with presetting bit or image recognition technology, control the video that camera head is taken talker.

11 kinds of possible arbitrary implementations of the first to the in conjunction with first aspect or first aspect, in the 12 kind of possible implementation of first aspect, described when current talker changes to the second talker from described the first talker, control the second camera head and take the second talker's video and comprise:

Judge that described the second talker position is whether in described the first talker's output picture;

If the video that described the second camera head is taken described the second talker, not in described the first talker's output picture, is controlled in described the second talker position;

If described the second talker position in described the first talker's output picture, further judges that described the second talker position is whether in the setting regions of described the first talker's output picture;

If the video that described the first camera head is taken described the second talker, in described setting regions, is controlled in described the second talker position;

If the second talker described in described the first camera head track up not in described setting regions, is controlled in described the second talker position, so that described the second talker position is in described setting regions.

Second aspect, provides a kind of device of controlling video capture, comprising:

Control unit, for when the first talker talks, controls the video that the first camera head is taken the first talker;

Described control unit, also for when current talker changes to the second talker from described the first talker, control the second camera head and take the second talker's video, wherein, described the second talker is the next talker different from described the first talker position;

Described control unit, also for when talker occurring again changing, controlling successively described the first camera head and described the second current talker's of camera head reverse shot video when follow-up;

Processing unit, is connected with described control unit, for exporting described current talker's video after the video successfully obtaining described current talker.

In conjunction with second aspect, in the possible implementation of the first of second aspect, described processing unit specifically for:

Described current talker's video full screen display is set;

The described current talker's of full frame output video.

In conjunction with the possible implementation of the first of second aspect, in the possible implementation of the second of second aspect, described processing unit specifically for:

Before successfully obtaining described current talker's video, the described current talker's of full frame output previous talker's video; After successfully obtaining described current talker's video, the described current talker's of full frame output video.

In conjunction with second aspect, in the third possible implementation of second aspect, described processing unit also specifically for:

The video that described current talker's video and described current talker's previous talker be set shows with the form of picture-in-picture;

Wherein, described picture-in-picture comprises the first picture and is included in the second picture in described the first picture, less than described the first picture, in described the first picture, show described current talker, in described the second picture, show described current talker's previous talker;

With the form of picture-in-picture, export described current talker and described current talker's previous talker's video simultaneously;

In conjunction with the third possible implementation of second aspect, in the 4th kind of possible implementation of second aspect, described control unit also for:

Described processing unit specifically for:

In conjunction with second aspect, in the 5th kind of possible implementation of second aspect, described processing unit also specifically for:

The video that described current talker's video and described current talker's previous talker be set shows with the form of two pictures;

Wherein, the described pair of picture comprises the two part pictures that do not comprise mutually, current talker described in a part of picture disply, current talker's previous talker described in another part picture disply;

With the form of two pictures, export described current talker and described current talker's previous talker's video simultaneously.

In conjunction with the 5th kind of possible implementation of second aspect, in the 6th kind of possible implementation of second aspect, described control unit also for:

Described processing unit specifically for:

In conjunction with second aspect, in the 7th kind of possible implementation of second aspect, described control unit also for:

Before control the first camera head is taken the first talker's video, when initial condition, control the video that described the first camera head and described the second camera head are taken whole meeting-place;

Described processing unit, also for exporting captured video.

In conjunction with seven kinds of possible arbitrary implementations of the first to the of second aspect or second aspect, in the 8th kind of possible implementation of second aspect, described control unit also for:

Described control unit specifically for: when the first talker talks, the first camera head that control has the first tracking mark removes to take the first talker's video, after successfully obtaining described the first talker's video, the tracking mark of described the first camera head is set to described the second tracking mark from described the first tracking mark, and the tracking mark of described the second camera head is set to described the first tracking mark from described the second tracking mark simultaneously;

Described control unit specifically for: when current talker changes to the second talker from described the first talker, the second camera head that control has described the first tracking mark removes to take the second talker's video, after successfully obtaining described the second talker's video, the tracking mark of described the second camera head is set to described the second tracking mark from described the first tracking mark, and the tracking mark of described the first camera head is set to described the first tracking mark from described the second tracking mark simultaneously.

The 8th kind of possible implementation in conjunction with second aspect, in the 9th kind of possible implementation of second aspect, described control unit specifically for: when follow-up each generation talker changes, the camera head that control has described the first tracking mark removes to take current talker's video, after successfully obtaining current talker's video, the tracking mark of described the first camera head and described the second camera head is exchanged.

In conjunction with the 9th kind of possible implementation of second aspect, in the tenth kind of possible implementation of second aspect, described control unit specifically for:

In conjunction with the tenth kind of possible implementation of second aspect, in the 11 kind of possible implementation of second aspect, described control unit specifically for:

In conjunction with 11 kinds of possible arbitrary implementations of the first to the of second aspect or second aspect, in the 12 kind of possible implementation of second aspect, described control unit specifically for:

Adopt after technique scheme, according to the device of the method for control video capture provided by the invention and control video capture, while having people alternately to talk in meeting-place, control successively described the first camera head and described the second current talker's of camera head reverse shot video, and export current talker's video, like this, even if there is many people alternately speech rapidly in meeting-place, two camera heads also can be taken a plurality of talkers' facial picture, and in technical scheme provided by the present invention, only after camera head successfully obtains current talker's video, just export current talker's video, with respect to need to being first switched to the panorama in meeting-place in prior art before camera head successfully obtains next talker's video, the present invention can reduce video switch number of times really, thereby picture is connected closely, the video of output is more smooth.

Accompanying drawing explanation

For the clearer explanation embodiment of the present invention or technical scheme of the prior art, to the accompanying drawing of required use in embodiment or prior art be briefly described below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skills, do not paying under the prerequisite of creative work, can also obtain according to these accompanying drawings other accompanying drawing.

Fig. 1 is the flow chart of an embodiment of the present invention's method of controlling video capture;

Fig. 2 A be after changing talker position in the situation that before changing in the setting regions of talker's output picture, take talker's schematic diagram after changing;

Fig. 2 B be after changing talker position in talker's before changing output picture but not in the situation that in the setting regions of this picture, take talker's schematic diagram after changing;

Fig. 2 C be after changing talker position not in the situation that before changing in talker's output picture, take talker's schematic diagram after changing;

Fig. 3 A is the flow chart of a specific embodiment of the present invention's method of controlling video capture;

Fig. 3 B is another flow chart of a specific embodiment of the present invention's method of controlling video capture;

Fig. 4 is the schematic diagram of a specific embodiment of the present invention's method of controlling video capture;

Fig. 5 A exports the effect schematic diagram of video camera rotation/push-and-pull process while being full screen display;

Fig. 5 B does not export the effect schematic diagram of video camera rotation/push-and-pull process while being full screen display;

Fig. 6 is the flow chart of another specific embodiment of the present invention's method of controlling video capture;

Fig. 7 is the schematic diagram of another specific embodiment of the present invention's method of controlling video capture;

Fig. 8 A exports the effect schematic diagram of video camera rotation/push-and-pull process when showing with picture-in-picture;

Fig. 8 B does not export the effect schematic diagram of video camera rotation/push-and-pull process when showing with picture-in-picture;

Fig. 9 is the flow chart of the another specific embodiment of the present invention's method of controlling video capture;

Figure 10 is the schematic diagram of the another specific embodiment of the present invention's method of controlling video capture;

Figure 11 A exports the effect schematic diagram of video camera rotation/push-and-pull process when with two picture disply;

Figure 11 B does not export the effect schematic diagram of video camera rotation/push-and-pull process when with two picture disply;

Figure 12 is the structured flowchart of an embodiment of the present invention's device of controlling video capture;

Figure 13 A is the structural representation of another embodiment of the present invention's device of controlling video capture;

Figure 13 B is the structural representation of an embodiment again of the present invention's device of controlling video capture;

Figure 13 C is the structural representation of the another embodiment of the present invention's device of controlling video capture.

Embodiment

Below in conjunction with accompanying drawing, the technical scheme of the embodiment of the present invention is clearly and completely described, obviously, described embodiment is only a part of embodiment of the present invention, rather than whole embodiment.Embodiment based in the present invention, the every other embodiment that those of ordinary skills obtain under the prerequisite of not making creative work, belongs to the scope of protection of the invention.

Fig. 1 is the flow chart of an embodiment of the present invention's method of controlling video capture.The method of the control video capture that the embodiment of the present invention provides can be implemented by a class device that possesses control processing capacity, and described device can be for example video camera, Video Controller, video terminal etc.The method of the control video capture that as shown in Figure 1, the embodiment of the present invention provides comprises:

S11, when the first talker talks, controls the video that the first camera head is taken the first talker.

In embodiments of the present invention, two groups of camera heads are set: the first camera head and the second camera head are taken talker's video.Wherein, described the first camera head can be a photographing module, and described the second camera head can be also a photographing module.Certainly, within the scope of the invention, described the first camera head and described the second camera head also can be respectively a plurality of photographing modules, and the concrete application of a plurality of photographing modules can obtain similarly according to the application of a photographing module.Described the first camera head and described the second camera head can be connected and fixed by jockey, also can be separate.The camera head of mentioning in the embodiment of the present invention can possess the terminal equipment of camera function for video camera or other.

The method of the control video capture that the embodiment of the present invention provides can be applied in video conference, for taking and export local meeting-place talker's video, can also be for the picture in local meeting-place is sent to remote site, so that the participant of remote site watches the situation in local meeting-place.

After camera head is opened, when video conference starts, if there is no people's speech in local meeting-place, can control the panorama that the first camera head and the second camera head are all taken local meeting-place simultaneously.If the first camera head is taken the talker of first appearance in meeting-place described in predetermined control, preferably first export the captured video of the second camera head to remote site.Now, owing to there is no talker, occur, the participant of remote site only need watch the panorama in local meeting-place.While having talker to start to talk, while there is the first talker, can control immediately the video that the first camera head is taken the first talker in local meeting-place; Can still control the panorama that the second camera head is taken local meeting-place simultaneously.

In embodiments of the present invention, can utilize auditory localization technology to determine talker's position.Only utilize the auditory localization technology may be due to the position that noise jamming etc. is former thereby cannot Obtaining Accurate talker, therefore, further, also residing possible position in the time of can presetting talker and talk in local meeting-place, when obtaining talker's position by auditory localization technology, the accuracy rate judging in conjunction with predefined possible position (being presetting bit) is higher.In order to obtain more exactly talker's position, can be in conjunction with auditory localization technology and image recognition technology.Particularly, when controlling camera head (comprising the first camera head and the second camera head) shooting talker's video, a plurality of pickup microphone can be formed to pickup microphone array, when the first talker talks, utilize described pickup microphone array to pick up the sound in local meeting-place, through audio frequency pre-treatment, send auditory localization device to.Wherein, described auditory localization device is to possess the module that a class device of controlling processing capacity possesses sound source positioning function described in being arranged in, and described pickup microphone array is comprised of pickup microphone plural, that be distributed in local meeting-place diverse location.Described auditory localization device positions processing to it after receiving the sound that described pickup microphone array picks up, and obtains the first talker's positional information.Controller can generate corresponding camera head control command according to positional information and send to The Cloud Terrace, described in cradle head control, the first camera head turns to suitable shooting angle, to obtain roughly described the first talker's video, wherein, the camera head control command that described The Cloud Terrace sends for receiving and carry out described controller.Then, (described image recognition technology is specifically as follows recognition of face for the positional information obtaining in conjunction with auditory localization, presetting bit information or image recognition technology, people's face detects, the moving detection of lip etc.), obtain described the first talker's positional information more accurately, generate new control command and send to The Cloud Terrace, control described the first camera head rotation/push-and-pull camera, according to the sizeable picture of the first talker described in Requirement Acquisition, such as can be so that described the first talker's face occupies 1/2,1/3 or 1/4 etc. of whole picture.

Because the precision of auditory localization technology is high or be easily subject to noise jamming, do not cause location inaccurate, the embodiment of the present invention utilizes auditory localization technology in conjunction with presetting bit or image recognition technology, can accurately determine talker's position, and then control camera head and take.It should be noted that, in the present invention, can only use auditory localization technology according to actual conditions, or use auditory localization technology in conjunction with presetting bit, or use the presetting bit of auditory localization technology set, can also use auditory localization technology simultaneously in conjunction with presetting bit and image recognition technology.

S12, when current talker changes to the second talker from described the first talker, controls the video that the second camera head is taken the second talker, and wherein, described the second talker is the next talker different from described the first talker position.

Current talker refers to the current people who is talking in local meeting-place, and in step S11, S12, current talker is respectively described the first talker and described the second talker.It should be noted that, in talker position, occur after changing and before camera head successfully obtains talker's after changing video, although described camera head not yet successfully obtain described in talker's video after changing,, in this process, current talker has been described talker after changing.

The video class of taking the first talker with described control the first camera head seemingly, change the position of can be first identifying talker according to auditory localization technology, be that talker changes to from described the first talker described the second talker that position is different from described the first talker, and then control the second camera head rotation/push-and-pull to suitable shooting angle and take size.Then, as step S11, in conjunction with presetting bit or image recognition technology, according to demand, further control described the second camera head rotation/push-and-pull camera, take the sizeable video of described the second talker.

It should be noted that, if talker just moves a little, for example only moved the distance of one, two body position, can think that talker's position do not change, do not need to switch camera head, and, as long as talker is still in the setting regions in shooting picture, as account in the central area of whole picture 80%, camera head does not need rotation/push-and-pull camera to follow the tracks of yet.If talker has occurred to walk about, as long as talker still in the setting regions in shooting picture, can think that talker's position is not changed, do not need to switch camera head, camera head does not need rotation/push-and-pull camera to follow the tracks of yet.If talker changes to another talker, but, just there is speech alternately in two talkers, or two talkers' distance is very near on same position, be in together in the setting regions of a filming apparatus shooting picture, can think that talker's position do not change, not need to switch camera head, camera head does not need rotation/push-and-pull camera to follow the tracks of (with reference to Fig. 2 A yet, solid line represents shooting picture, and dotted line represents setting regions).No matter be same talker or different talker, if talker position is not in output picture but in setting regions, do not need to switch camera head, but can rotate slightly/push-and-pull camera, make the talker after changing middle part (with reference to Fig. 2 B) in picture.In explanation below, except special instruction, the change of talker's change or talker's position all refers to that talker's position changes, and the distance between position after changing and shooting picture center reached the degree that need to switch camera head, described degree can be set (with reference to Fig. 2 C) according to the concrete scene of reality.

S13, when talker occurring again changing, controls described the first camera head and described the second current talker's of camera head reverse shot video when follow-up successively.

Particularly, when follow-up talker changes to next talker-tri-talker of described the second talker from described the second talker, control the video that described the first camera head is taken described the 3rd talker.If there is afterwards talker's change, next talker-tetra-talkers that talker changes to described the 3rd talker from described the 3rd talker, control the video that described the second camera head is taken described the 4th talker again.So repeatedly, guarantee described the first camera head and described the second current talker's of camera head reverse shot video.

For example, if there are first, second, third, four talkers of fourth in local meeting-place, first starts speech at first, first controls the first camera head and takes first; When talker changes to second by first, control the second camera head and take second; Talker is changed to third o'clock by second afterwards, again controls the first camera head and takes third; When talker changes to fourth by third again, again control the second camera head and take fourth, so repeatedly.

When in meeting-place, many people alternately talk rapidly, prior art can comprise a plurality of talkers come in for taking the picture of the camera head shooting of talker's video, if described a plurality of talker is distant, cannot in captured picture, observe described a plurality of talkers' expression, cause the valuable information loss of meeting.The present invention is quite different, and the first camera head and the second camera head all can be followed the trail of talker, and wherein, when a camera head is followed the trail of current talker, another camera head is followed the trail of talker after changing.So, can guarantee that the first camera head and the second camera head cooperatively interact, slitless connection: when the first camera head is taken current talker, utilize the second camera head to take described current talker's next talker; When the second camera head is taken current talker, utilize the first camera head to take described current talker's next talker.While especially only having two talkers of first, second in local meeting-place, the first camera head can keep track up first, the second camera head can keep track up second, if talker is speech alternately, because the first camera head and the second camera head have all been adjusted respectively focal length, thereby saved the process of rotation/push-and-pull camera.Like this, even if there is talker's alternately speech rapidly in meeting-place, the facial picture that two camera heads also can reverse shot talker, retain more the valuable information of meeting, and the efficiency of video frequency tracking is also improved.

S14, after successfully obtaining described current talker's video, exports described current talker's video.

Particularly, take described current talker's camera head after successfully getting described current talker's video, export described current talker's video, the described current talker's of described output video is included in the display screen of described camera head or the display screen in local meeting-place in a different manner (i.e. full frame, picture-in-picture, two pictures etc.) and exports, and also comprises and outputs in a different manner remote site.It should be noted that, the present invention is sent to remote site for video captured in local meeting-place by which kind of mode (such as coding, decoding etc.) and does not limit.In being sent to the process of remote site, for example described current talker's video can be sent to video signal preprocessor, after video signal preprocessor is received described current talker's video, carry out the processing such as compression coding, then the code stream obtaining after described compression coding is sent to remote site by network; Remote site receives after described code stream, and the processing such as decode obtains described current talker's video, then can on the display screen of remote site, show in a different manner.Like this, the participant of remote site just can watch the picture in local meeting-place on described display screen.

When talker changes, camera head obtains the process need regular hour of talker's video after changing.In the meantime, prior art is the panorama to meeting-place by Picture switch first, when camera head successfully obtains talker's after changing video, just Picture switch is arrived to talker after changing, can cause like this video not smooth.In embodiments of the present invention, successfully obtain described current talker's video at step S14 before, the method for the control video capture that the embodiment of the present invention provides also can comprise: the video of exporting described current talker's previous talker.That is,, before successfully obtaining described current talker's video, export described current talker's previous talker's video; After successfully obtaining described current talker's video, export described current talker's video.Like this, when full frame output picture, not only can guarantee to export picture continuous, but also it is higher to guarantee to export image quality, avoid camera head in the process of video of obtaining described current talker, because of camera head rotation/push-and-pull camera cause the picture of output occur fuzzy, the phenomenon such as rock.

Certainly, in embodiments of the present invention, when the picture in the described local meeting-place of output, not only can full framely export, but also can export with forms such as picture-in-picture, two pictures.When adopting the formal output of picture-in-picture, after successfully obtaining described current talker's video, can in large picture (the first picture), export described current talker, and in little picture (the second picture), export described current talker's previous talker.When adopting two visual format output, after successfully obtaining described current talker's video, can in wherein a part of picture of two parts picture not comprising mutually, export described current talker, and in another part picture, export described current talker's previous talker.Specific implementation about these output forms will be introduced respectively in specific embodiment below.

Further, in embodiments of the present invention, for the ease of controlling two camera heads, take in turn current talker and the described current talker's of output video, can before starting shooting, be respectively two camera heads tracking mark is set, can be for example the first tracking mark and the second tracking mark for described the first camera head and described the second camera head arrange respectively initial tracking mark, described tracking mark can represent by the numeral such as 0 or 1.The camera head that tracking mark can be set be the first tracking mark is specifically designed to the video of taking current talker, and the camera head that tracking mark is set is the second tracking mark is specifically designed to the next talker's (or previous talker) who takes described current talker video.And after successfully obtaining described current talker's video, the tracking mark of described the first camera head and described the second camera head needs to exchange.

Be that the first camera head and the second camera head are arrange tracking mark in the situation that, step S11 is when the first talker talks, controlling the first camera head takes the first talker's video and can comprise: when the first talker talks, the first camera head that control has the first tracking mark removes to take the first talker's video, after successfully obtaining described the first talker's video, the tracking mark of described the first camera head is set to described the second tracking mark from described the first tracking mark, the tracking mark of described the second camera head is set to described the first tracking mark from described the second tracking mark simultaneously.

Step S12 is when current talker changes to the second talker from described the first talker, controlling the second camera head takes the second talker's video and can comprise: when current talker changes to the second talker from described the first talker, the second camera head that control has described the first tracking mark removes to take the second talker's video, after successfully obtaining described the second talker's video, the tracking mark of described the second camera head is set to described the second tracking mark from described the first tracking mark, the tracking mark of described the first camera head is set to described the first tracking mark from described the second tracking mark simultaneously.

Step S13 is when follow-up when talker occurring again changing, the video of controlling successively described the first camera head and described the second current talker of camera head reverse shot can comprise: when follow-up each generation talker changes, the camera head that control has described the first tracking mark removes to take current talker's video, after successfully obtaining current talker's video, the tracking mark of described the first camera head and described the second camera head is exchanged.Like this, can guarantee that two camera heads cooperatively interact, slitless connection, current talker's video described in reverse shot.

In the embodiment of the present invention, the first camera head and the second camera head all can be followed the trail of talker.When the first talker talks, control described the first camera head and take described the first talker, meanwhile, described the second camera head is in the armed state of preparing the first talker's next talker described in track up.When current talker changes to the second talker (i.e. the next talker different from described the first talker position) by described the first talker, control described the second camera head and take described the second talker, meanwhile, described the first camera head keeps taking described the first talker, and changes the state of preparing the track up next talker different from described the second talker position into.Like this, can guarantee that the first camera head and the second camera head can cooperatively interact, slitless connection.Due to when talker changes, camera head successfully obtains the process need regular hour of talker's after changing video.In the meantime, prior art is owing to adopting a camera head to be specifically designed to the panorama of taking local meeting-place, another camera head is specifically designed to track up talker, therefore, before the described camera head that is specifically designed to track up talker successfully obtains current talker's video, need the first panorama to meeting-place by Picture switch, when camera head successfully obtains current talker's video, just Picture switch is arrived to talker after changing, can cause like this video not smooth.And in technical scheme provided by the present invention, only after camera head successfully obtains current talker's video, just export described current talker's video, successfully obtain current talker's video at camera head before, keep the described current talker's of output previous talker's video.Like this, with respect to prior art, need to before camera head successfully obtains next talker's video, first be switched to the panorama in local meeting-place, the present invention can reduce video switch number of times really, thereby picture is connected closely, and the video of output is more smooth.And when in local meeting-place, many people alternately talk rapidly, the picture of taking according to prior art can comprise a plurality of talkers come in, if described a plurality of talker is distant, cannot observe described a plurality of talkers' expression in captured picture.In the present invention, due to cooperatively interacting of described the first camera head and described the second camera head, even if there is talker's alternately speech rapidly in local meeting-place, the facial picture that two camera heads also can reverse shot talker.

For understanding better the present invention, below with reference to Fig. 3 A to Figure 10, then take several specific embodiments and the present invention is further described as example.Also must note, below cited embodiment be a part of embodiment of the present invention, those skilled in the art, by content of the present invention, can be easy to expect other embodiment, they all within the scope of the invention.

In following specific embodiment, can utilize tracking mark to carry out mark to camera head, and the captured video of camera head of tracking mark is specified in output.For example, initial tracking mark that can the first camera head is set to i.e. the first tracking mark of 0(), the initial tracking mark of the second camera head is set to i.e. the second tracking mark of 1(), wherein, the camera head that tracking mark is 0 is for taking current talker's video; Tracking mark be 1 camera head for taking next talker's of current talker video, hereinafter for easy, all as example, describe.Certainly, the tracking mark that the tracking mark of the first camera head is set to 1, the second camera head is set to 0, or other modes that tracking mark is set are also fine, and the present invention is not construed as limiting this.

Fig. 3 A is the flow chart of a specific embodiment of the present invention's method of controlling video capture.Fig. 3 B is another flow chart of a specific embodiment of the present invention's method of controlling video capture.

As shown in Figure 3A, the camera head of take is that video camera is example, and the method for the control video capture that the specific embodiment of the invention provides comprises:

S31, when meeting starts, controls the panorama that two video cameras are taken local meeting-place.

After described two video cameras (the first video camera and the second video camera) are opened, when meeting starts, local meeting-place also there is no people's speech, for the deployment scenarios in local meeting-place is sent to remote site, can control described two video cameras and take the panorama in local meeting-place, the angle of taking and large I are arranged by user, and preferably arranging can be the setting that can comprise all participants and main conference scenario.When the picture that video camera is taken is sent to remote site from local meeting-place, due to the panoramas that are local meeting-place that now two video cameras are taken, thereby can transmit the picture that any video camera is taken, preferably first transmit tracking mark and be the picture of 1 video camera (i.e. the second video camera) shooting.

S32, utilizes auditory localization technology, controls the video that the first video camera is taken described the first talker.

After controlling the panorama in described two video cameras shooting meeting-place, while having a people to start to talk in meeting-place, while there is the first talker, pickup microphone array picks up the sound in local meeting-place, and described sound is sent to auditory localization device, described auditory localization device produces talker's positional information according to auditory localization technology.Then, controller is controlled according to described positional information the video camera that tracking mark is 0 and is taken the sizeable video of described the first talker.Described tracking mark is that after 0 video camera (i.e. the first video camera) photographs the sizeable video of described the first talker, its tracking mark is set to 1, and the tracking mark of another video camera (i.e. the second video camera) is set to 0 by 1.

S33, when current talker changes to the second talker from described the first talker, controls the video that described the second video camera is taken described the second talker, and wherein, described the second talker is the next talker different from described the first talker position.

After described the first video camera photographs the sizeable video of described the first talker, the tracking mark of described the first video camera has become 1, and the tracking mark of described the second video camera has become 0.Afterwards, if talker's position changes, by described the first talker, change to described second talker different from described the first talker position, it is the video that 0 video camera (being described the second video camera) goes to take described the second talker that controller can be controlled described tracking mark, controls the same S32 of method taking.After video camera that described tracking mark is 0 photographs the sizeable video of described the second talker, its tracking mark is set to 1, and the tracking mark of another video camera is set to 0 by 1.

S34, when talker occurring again changing, controls described the first video camera and described the second current talker's of video camera reverse shot video when follow-up successively.

After described the second video camera photographs the sizeable video of described the second talker, the tracking mark of described the second video camera has become 1, and the tracking mark of described the first video camera has become 0.Afterwards, if talker changes to the 3rd talker (being described the second talker's next talker) by described the second talker again, control tracking mark and be 0 video camera (being described the first video camera) and go to take the 3rd talker, after video camera that described tracking mark is 0 successfully obtains described the 3rd talker's video, described tracking mark is that the tracking mark of 0 video camera is set to 1 by 0, and the tracking mark of another video camera is set to 0 by 1.Similarly, when talker changes to the 4th talker (described the 3rd talker's next talker) by described the 3rd talker, control again tracking mark and be 0 video camera (being described the second video camera) and go to take described the 4th talker, after video camera that described tracking mark is 0 successfully obtains described the 4th talker's video, described tracking mark is that the tracking mark of 0 video camera is set to 1 by 0, and the tracking mark of another video camera is set to 0 by 1.Like this, when each generation talker changes, all control tracking mark and be 0 video camera (may be specifically the first video camera or the second video camera) and go track up talker after changing, and, after this video camera successfully obtains talker's video, its tracking mark is set to 1 by 0, and the tracking mark of another video camera is set to 0 by 1.

S35, after the video camera of taking described current talker's video successfully obtains current talker's video, the described current talker's of full frame output video.

When being designated after 0 video camera successfully obtains current talker's video, the tracking mark of the video camera that described tracking mark is 0 is set to 1 by 0, and the tracking mark of another video camera is set to 0 by 1.So the video that the video camera that tracking mark is 1 is after changing taken is described current talker's video.At this, the described current talker's of described full frame output video refers to that the video of output is from a video camera.In the picture of full screen display, a talker can be only shown, also a plurality of talkers can be shown.Wherein, a plurality of talkers' close together, makes according to body language or the facial information of taking video out and can observe each talker.With reference to step S12, the video that cannot take at same video camera if a plurality of talker is distant is observed each talker, can think that change has occurred in talker's position, can use another video camera to take talker's video.After described current talker's video is sent to remote site with full frame form, the participant of remote site can clearly observe described current talker's feature picture, wherein said feature picture may comprise important conferencing information, like this, can retain as much as possible important conferencing information.

As shown in Figure 4, in three width figure from left to right, when the first width figure represents that meeting begins, display is displayed in full screen the panorama in local meeting-place; The second width figure represents, after the first talker occurs, display is displayed in full screen the first talker's video; The 3rd width figure represents, after talker changes to the second talker by the first talker, display is displayed in full screen the second talker.

S36, before the video camera of taking described current talker's video successfully obtains described current talker's video, exports described current talker's previous talker's video.

It should be noted that, step S36 carried out before step S35.

Owing to there is change from talker, start, in the video that successfully obtains described current talker to video camera process before, can rotate/push-and-pull of video camera camera, can produce fuzzy or unsettled picture thus.But, in said process, by exporting described current talker's previous talker's video, can avoid exporting described fuzzy or unsettled picture.

For ease of understanding, contrast accompanying drawing 5A and 5B below and describe.As shown in Figure 5A, according to order from left to right, arrange three width figure and be respectively the first width figure, the second width figure, the 3rd width figure.The 3rd width figure talker is the first width figure talker's next talker, from talker, change occurring starts successfully to obtain the sizeable video of the 3rd width figure talker process before to video camera, if directly export the picture that video camera is taken in rotation/push-and-pull camera process, just there will be fuzzy in the second width figure or unsettled picture.Correspondingly, the specific embodiment of the invention is in said process, what export is the first width figure talker's video, and only after successfully obtaining the 3rd width figure talker's sizeable video, just export the 3rd width figure talker's video, can avoid like this exporting described fuzzy or unsettled picture (with reference to Fig. 5 B).

In addition, according to the situation in local meeting-place, several situations below this specific embodiment may occur in the process realizing, corresponding processing mode is as follows:

(1), the unmanned speech in local meeting-place

Do not switch the picture of output, still export the panorama in local meeting-place;

(2), the single people in local meeting-place speech, nobody chips in

The picture of output is current talker's full screen display picture;

(3), the single people in local meeting-place is in speech, he someone chips in, but the time of chipping in is very short

Do not switch the picture of output, still export the picture of main teller's full screen display;

(4), the single people in local meeting-place speech, time have a movement

If talker walks about, the skew of head or health does not exceed current output picture and is positioned at the setting central area of this picture, and video camera does not switch, and does not follow the tracks of yet, and the picture of output is the full screen display picture that current talker is positioned at central area; The setting central area that likely or exceeds this picture if talker's movement makes talker not exceed yet current output picture, video camera does not switch, but can do suitable tracking, to keep talker to be positioned at central area; If talker's movement makes talker exceed current output picture, switch video camera, talker is followed the tracks of;

(5), there is once change in local meeting-place teller, is altered to bystander or other people

If talker position does not after changing exceed output picture before changing and is positioned at the setting central area of this picture, video camera does not switch, and does not follow the tracks of yet, and the picture of output is the full screen display picture that talker is after changing positioned at central area; If talker's position does not exceed yet output picture before changing but likely or exceeds the setting central area of this picture after changing, video camera does not switch, but can do suitable tracking, to keep talker to be after changing positioned at central area, the picture of output is positioned at the full screen display picture of central area for talker after changing; If talker position after changing has exceeded output picture before changing, switch video camera, talker is after changing followed the tracks of;

(6), the many people in local meeting-place talk simultaneously, rob speech phase

The time of robbing in this case words is conventionally very short, does not switch the picture of output;

(7), the many people in local meeting-place discuss,, repeatedly there is the change of teller position in alternately speech

Video camera is alternately followed the tracks of each position teller is after changing occurred, and the picture of output is talker's after changing full screen display picture.

In this specific embodiment, when change each talker's of generation position, all control tracking mark and be 0 video camera and go track up position talker after changing, and, after this video camera successfully obtains talker's suitable video, its tracking mark is set to 1 by 0, and the tracking mark of another video camera is set to 0 by 1.So always can guarantee, sometime, have a video camera taking current talker, also have the other video camera can be for taking described current talker's next talker simultaneously.That is to say, two video cameras can cooperatively interact, slitless connection.While there is change due to the position talker, video camera successfully obtains the process need regular hour of talker's after changing video.In the meantime, the video that keeps the described current talker's of output previous talker, only after video camera successfully obtains current talker's video, just export described current talker's video, with respect to prior art, need the first panorama to meeting-place by Picture switch, when video camera successfully obtains talker's after changing video, just Picture switch is arrived to talker after changing, the present invention can reduce video switch number of times really, thereby picture is connected closely, and the video of output is more smooth.And, when in meeting-place, many people alternately talk rapidly, the picture that prior art is specifically designed to the video camera shooting of taking talker's video can comprise a plurality of talkers come in, if described a plurality of talker is distant, cannot in captured picture, observe described a plurality of talkers' expression.In the present invention, due to cooperatively interacting of described the first video camera and described the second video camera, even if there is talker's alternately speech rapidly in meeting-place, the facial picture that two video cameras also can reverse shot talker.In addition, by the described current talker's of full frame output video, the participant of remote site can more clearly observe described current talker's facial feature, and these facial features may comprise important conferencing information, like this, can retain more valuable conferencing information.

Fig. 6 is the flow chart of another specific embodiment of the present invention's method of controlling video capture.

As shown in Figure 6, the camera head of take is that video camera is example, and the method for the control video capture that the specific embodiment of the invention provides comprises:

S61, when meeting starts, controls the panorama that two video cameras are taken local meeting-place.

After described two video cameras are opened, when meeting starts, the also nobody's speech of local meeting-place, for the deployment scenarios in local meeting-place is sent to remote site, can control described two video cameras and take the panorama in local meeting-place, the angle of taking and large I are arranged by user, preferably arranging can be the setting that can comprise all participants and main conference scenario, and, when the panoramic video in the local meeting-place of output, preferably first export tracking mark and be the video of 1 shot by camera.

S62, in conjunction with auditory localization technology and presetting bit, controls the video that the first video camera is taken described the first talker.

After controlling the panorama in described two video cameras shooting meeting-place, while having a people to start to talk in meeting-place, while there is the first talker, utilize auditory localization technology to obtain the first talker's positional information.Again in conjunction with presetting bit, residing possible position while talking in local meeting-place in conjunction with predefined, talker, determines described the first talker's accurate location.Particularly, can from a plurality of presetting bits, find out with the immediate presetting bit in position of auditory localization acquisition as accurate location.Then, controller is according to described the first talker's accurate location, and the video camera that control tracking mark is 0 removes to take the first talker's video.Described tracking mark is that after 0 video camera photographs described the first talker's suitable video, its tracking mark is set to 1, and the tracking mark of another video camera is set to 0 by 1.

S63, when current talker changes to the second talker from described the first talker, controls the video that described the second video camera is taken described the second talker, and wherein, described the second talker is the next talker different from described the first talker position.

After described the first video camera successfully photographs described the first talker's video, the tracking mark of described the first video camera has become 1, and the tracking mark of described the second video camera has become 0.Now, if talker changes, by described the first talker, change to described second talker different from described the first talker position, as step S62, it is the video that 0 video camera (being described the second video camera) goes to take described the second talker that controller can be controlled described tracking mark.After video camera that described tracking mark is 0 successfully photographs described the second talker's video, its tracking mark is set to 1, and the tracking mark of another video camera is set to 0 by 1.

S64, when talker occurring again changing, controls described the first video camera and described the second current talker's of video camera reverse shot video when follow-up successively.

After described the second video camera successfully photographs described the second talker's video, the tracking mark of described the second video camera has become 1, and the tracking mark of described the first video camera has become 0.If talker changes to the 3rd talker by described the second talker again, control tracking mark and be 0 video camera (being described the first video camera) and go to take the 3rd talker, after video camera that described tracking mark is 0 successfully obtains described the 3rd talker's suitable video, described tracking mark is that the tracking mark of 0 video camera is set to 1 by 0, and the tracking mark of another video camera (being described the second video camera) is set to 0 by 1.Similarly, when talker changes to the 4th talker (being described the 3rd talker's next talker) by described the 3rd talker, control again tracking mark and be 0 video camera (being described the second video camera) and go to take the 4th talker, after video camera that described tracking mark is 0 successfully obtains described the 4th talker's suitable video, described tracking mark is that the tracking mark of 0 video camera is set to 1 by 0, and the tracking mark of another video camera (being described the first video camera) is set to 0 by 1.When follow-up, when talker occurring again changing, carry out in the same way reverse shot.

S65 after the video camera of taking described current talker's video successfully obtains current talker's video, exports described current talker and described current talker's previous talker's video simultaneously with the form of picture-in-picture; Wherein, described picture-in-picture comprise the first picture and be included in described the first picture, than the first picture little the second picture, in described the first picture, export described current talker, in described the second picture, export described current talker's previous talker.

When being designated after 0 video camera successfully obtains current talker's video, the tracking mark of the video camera that described tracking mark is 0 is set to 1 by 0.Now, what the video camera that tracking mark is 1 was taken is described current talker's video, and what the video camera that tracking mark is 0 was taken is described current talker's previous talker's video.At this, the described form with picture-in-picture is exported described current talker and described current talker's previous talker's video simultaneously, refer to and in described the first picture, export described current talker, be contained in described the first picture and in little described the second picture, exporting described current talker's previous talker than the first picture.Like this, the participant of remote site, except can observing described current talker's facial expression, can also observe a side for the reaction expression of the opposing party's speech, and these expressions may comprise important conferencing information, like this, can retain as much as possible important conferencing information.

As shown in Figure 7, in three width figure from left to right, when the first width figure represents that meeting begins, with the panorama in the local meeting-place of formal output of picture-in-picture; The second width figure represents, after the first talker occurs, in large picture (i.e. the first picture), exports the first talker, and the lower right corner of screen (i.e. the second picture) exports local meeting-place panorama; The 3rd width figure represents, after talker changes to the second talker by the first talker, in large picture, exports the second talker, and the first talker is exported in the lower right corner of screen.

S66 before the video camera of taking described current talker's video successfully obtains described current talker's video, exports respectively the first two talker of described current talker in described the first picture and the second picture.

It should be noted that, step S66 carried out before step S65.

Owing to being altered to video camera from talker, successfully obtain described current talker's the process of video, can rotate/push-and-pull of video camera camera, thus produce fuzzy or unsettled picture.For this reason, can in described the first picture and the second picture, export respectively the first two talker of described current talker, can avoid like this exporting described fuzzy or unsettled picture.

For ease of understanding, contrast accompanying drawing 8A and 8B below and describe.As shown in Figure 8 A, according to order from left to right, arrange three width figure and be respectively the first width figure, the second width figure, the 3rd width figure.The first width figure lower right corner (i.e. the second picture) talker is the large picture of the first width figure (i.e. the first picture) talker's previous talker, and the large picture talker of the first width figure is the large picture talker's of the 3rd width figure previous talker.Now, talker changes to the large picture talker of the 3rd width figure by the large picture talker of the first width figure.From talker, change occurring starts successfully to obtain the process before the large picture talker's of the 3rd width figure video to video camera, if directly export the picture that video camera is taken in rotation/push-and-pull camera process, just there will be fuzzy or unsettled picture in the second width figure lower right corner picture.As shown in Figure 8 B, correspondingly, the specific embodiment of the invention is in said process, what export is the first width figure talker's moving frame (the large picture of the second width figure) and the first width figure talker's previous talker's freeze frame (the second width figure lower right corner picture), can avoid exporting described fuzzy or unsettled picture.

Certainly, according to actual needs, from talker, be altered to the process of video that video camera successfully obtains described current talker, also can adopt the way of output shown in the second width figure of Fig. 8 A.

(1), the unmanned speech in local meeting-place

The compound mode of output picture is constant, still exports the panorama in local meeting-place;

(2), the single people in local meeting-place speech, nobody chips in

In the first picture, export current talker, that the second picture is exported is described current talker's previous talker, and picture composition mode is constant;

In the first picture, export speaker, the second picture does not switch or exports the people that chips in, and preferably described the second picture does not switch;

(4), the single people in local meeting-place speech, time have a movement

If talker walks about, the skew of head or health does not exceed the first picture of current output and is positioned at the setting central area of the first picture, video camera does not switch, do not follow the tracks of yet, what the first picture was exported is the picture that current talker has action, the second picture is constant, and output picture composition mode is constant; The setting central area that likely or exceeds the first picture if talker's movement makes talker not exceed yet the first picture of current output, video camera does not switch, but can do suitable tracking, to keep talker to be positioned at the setting central area of the first picture, the second picture is constant, and output picture composition mode is constant; If talker's movement makes talker exceed the first picture of current output, switch video camera, talker is followed the tracks of, export talker after following the tracks of successfully in the first picture, the first Picture switch to the second picture before camera switching is exported;

If talker position does not after changing exceed the first picture before changing and is positioned at the setting central area of the first picture, video camera does not switch, do not follow the tracks of yet, the first picture output be the picture that talker is after changing positioned at central area, the second picture is constant; If talker's position does not exceed yet the first picture before changing but likely or exceeds the setting central area of the first picture after changing, video camera does not switch, but can do suitable tracking, to keep talker to be after changing positioned at the first picture central area, the second picture is constant; If talker position after changing has exceeded the first picture before changing, switch video camera, talker is after changing followed the tracks of to the first picture output talker after changing, the second picture output talker before changing;

The time of robbing in this case words is conventionally very short, and the compound mode of output picture is constant;

Video camera is alternately followed the tracks of each position teller is after changing occurred, and changes the compound mode of output picture, at every turn after changing, in described the first picture, exports current talker, and that the second picture is exported is described current talker's previous talker.

In this specific embodiment, when change each talker's of generation position, all control tracking mark and be 0 video camera and go track up position talker after changing, and, after this video camera successfully obtains the sizeable video of talker, its tracking mark is set to 1 by 0, and the tracking mark of another video camera is set to 0 by 1.So just always can guarantee, sometime, have a video camera taking current talker, also have an other video camera in idle condition simultaneously, can be for taking described current talker's next talker.That is to say, two video cameras can cooperatively interact, slitless connection.While there is change due to the position talker, video camera successfully obtains the process need regular hour of talker's after changing video.In the meantime, the video that keeps the described current talker's of output previous talker, only after video camera successfully obtains current talker's video, just export described current talker's video, with respect to prior art, need the first panorama to meeting-place by Picture switch, when video camera successfully obtains talker's after changing video, just Picture switch is arrived to talker after changing, the present invention can reduce video switch number of times really, thereby picture is connected closely, and the video of output is more smooth.And, when in meeting-place, many people alternately talk rapidly, the picture that prior art is specifically designed to the video camera shooting of taking talker's video can comprise a plurality of talkers come in, if described a plurality of talker is distant, cannot in captured picture, observe described a plurality of talkers' expression.In the present invention, due to cooperatively interacting of described the first video camera and described the second video camera, even if there is talker's alternately speech rapidly in meeting-place, the facial picture that two video cameras also can reverse shot talker.In addition, with the form of picture-in-picture, export described current talker and described current talker's previous talker's video simultaneously, make the participant of remote site can clearly observe described current talker's facial feature, can also see the situation of talker's change in local meeting-place and the reaction that a side makes a speech for the opposing party simultaneously, like this, just retained more valuable conferencing information.

Fig. 9 is the flow chart of the another specific embodiment of the present invention's method of controlling video capture.

As shown in Figure 9, the camera head of take is that video camera is example, and the method for the control video capture that the specific embodiment of the invention provides comprises:

S91, when meeting starts, controls the panorama that two video cameras are taken meeting-place.

After described two video cameras are opened, when meeting starts, the also nobody's speech of local meeting-place, for the deployment scenarios in local meeting-place is sent to remote site, can control described two video cameras and take the panorama in local meeting-place, the angle of taking and large I are arranged by user, preferably arranging can be the setting that can comprise all participants and main conference scenario, when the video of panorama in the local meeting-place of output, preferably first export tracking mark and be the video of 1 shot by camera.

S92, utilizes auditory localization technology and image recognition technology, controls the video that the first video camera is taken described the first talker.

After controlling the panorama in described two video cameras shooting meeting-place, while having a people to start to talk in meeting-place, while there is the first talker, utilize auditory localization technology to obtain the first talker's position, the video camera that control tracking mark is 0 turns to suitable angle.Recycle image recognition technology, further judge described the first talker's accurate location.Then, controller is according to described the first talker's accurate location, and the video camera that control tracking mark is 0 removes to take the first talker's video.Described tracking mark is that after 0 video camera photographs described the first talker's suitable video, its tracking mark is set to 1, and the tracking mark of another video camera is set to 0 by 1.

S93, when current talker changes to the second talker from described the first talker, controls the video that described the second video camera is taken described the second talker, and wherein, described the second talker is the next talker different from described the first talker position.

After described the first video camera successfully photographs described the first talker's video, the tracking mark of described the first video camera has become 1, and the tracking mark of described the second video camera has become 0.Now, if talker changes, by described the first talker, change to described second talker different from described the first talker position, as step S92, it is the video that 0 video camera (being described the second video camera) goes to take described the second talker that controller can be controlled described tracking mark.After video camera that described tracking mark is 0 photographs described the second talker's suitable video, its tracking mark is set to 1, and the tracking mark of another video camera is set to 0 by 1.

S94, when talker occurring again changing, controls described the first video camera and described the second current talker's of video camera reverse shot video when follow-up successively.

S95 after the video camera of taking described current talker's video successfully obtains current talker's video, exports described current talker and described current talker's previous talker's video simultaneously with the form of two pictures; Wherein, described pair of picture comprises the two part pictures that do not comprise mutually, the described current talker of a part of picture output, the described current talker's of another part picture output previous talker.

When being designated after 0 video camera successfully obtains current talker's video, the tracking mark of the video camera that described tracking mark is 0 is set to 1 by 0.Now, what the video camera that tracking mark is 1 was taken is described current talker's video, and what the video camera that tracking mark is 0 was taken is described current talker's previous talker's video.At this, the described form with two pictures is exported described current talker and described current talker's previous talker's video simultaneously, refer to and in a picture, export described current talker, in another picture, export described current talker's previous talker, above-mentioned two pictures do not comprise mutually.Like this, the participant of remote site, except can observing described current talker's facial expression, can also observe a side for the reaction expression of the opposing party's speech, and these expressions may comprise important conferencing information, like this, can retain as much as possible important conferencing information.

As shown in figure 10, in three width figure from left to right, when the first width figure represents that meeting begins, with the panorama in the local meeting-place of formal output of two pictures; The second width figure represents, after first talker's appearance, exports the first talker in the picture of left side, and right side picture is exported local meeting-place panorama; The 3rd width figure represents, talker exports the second talker after changing to the second talker by the first talker in the picture of right side, and left side picture is exported the first talker.

S96 before the video camera of taking described current talker's video successfully obtains described current talker's video, exports respectively the first two talker of described current talker in described pair of picture.

It should be noted that, step S96 carried out before step S95.

Owing to occurring from talker change, to video camera, successfully obtain in the process that described current talker's video finishes, can rotate/push-and-pull of video camera camera, thus produce fuzzy or unsettled picture.For this reason, in described pair of picture, export respectively the first two talker of described current talker, can avoid exporting described fuzzy or unsettled picture.

With accompanying drawing 11A and the 11B of contrast, describe below.As Figure 11 A, according to order from left to right, arrange three width figure and be respectively the first width figure, the second width figure, the 3rd width figure.The first width figure right side picture talker is the first width figure left side talker's previous talker, and the first width figure left side picture talker is the 3rd width figure right side picture talker's previous talker.Now, talker changes to the 3rd width figure right side picture talker by the first width figure left side picture talker.From talker, change occurring starts successfully to obtain the process before the 3rd width figure right side picture talker's suitable video to video camera, if directly export the picture that video camera is taken in rotation/push-and-pull camera process, just there will be fuzzy or unsettled picture in the second width figure right side picture.As shown in Figure 11 B, correspondingly, the specific embodiment of the invention is in said process, what export is the first width figure talker's moving frame (the second width figure right side picture) and the first width figure talker's previous talker's freeze frame (the second width figure left side picture), can avoid exporting described fuzzy or unsettled picture.

Certainly, according to actual needs, from talker, be altered to the process of video that video camera successfully obtains described current talker, also can adopt the way of output shown in the second width figure of Fig. 7 A.

(1), the unmanned speech in local meeting-place

(2), the single people in local meeting-place speech, nobody chips in

In part picture, export current talker, that another part picture is exported is described current talker's previous talker, and picture composition mode is constant;

In part picture, export speaker, another part picture does not switch or exports the people that chips in, and preferably described another part picture does not switch;

(4), the single people in local meeting-place speech, time have a movement

If talker walks about, the skew of head or health does not exceed current output picture and is positioned at the setting central area of this picture, and video camera does not switch, and does not follow the tracks of yet, and output picture composition mode is constant; The setting central area that likely or exceeds current output picture if talker's movement makes talker not exceed yet current output picture, video camera does not switch, but can do suitable tracking, to keep talker to be positioned at central area, output picture composition mode is constant; If talker's movement makes talker exceed current output picture, switch video camera, talker is followed the tracks of;

If a rear talker position does not exceed previous talker's output picture and is positioned at the setting central area of this picture, video camera does not switch, and does not follow the tracks of yet, and the picture of output is the picture that a rear talker is positioned at central area; The setting central area that likely or exceeds this picture if a rear talker's position does not exceed previous talker's output picture yet, video camera does not switch, but can do suitable tracking, to keep a rear talker to be positioned at central area, the picture of output is the picture that a rear talker is positioned at central area; If a rear talker position has exceeded previous talker's output picture, switch video camera, a rear talker is followed the tracks of;

Video camera is alternately followed the tracks of each position teller is after changing occurred, and changes the compound mode of output picture, at every turn after changing, exports current talker in a part of picture, and that another part picture is exported is described current talker's previous talker.

In this specific embodiment, when change each talker's of generation position, all control tracking mark and be 0 video camera and go track up position talker after changing, and, after this video camera successfully obtains the sizeable video of talker, its tracking mark is set to 1 by 0, and the tracking mark of another video camera is set to 0 by 1.So always can guarantee, sometime, have a video camera taking current talker, also have the other video camera can be for taking described current talker's next talker simultaneously.That is to say, two video cameras can cooperatively interact, slitless connection.While there is change due to the position talker, video camera successfully obtains the process need regular hour of talker's after changing video.In the meantime, the video that keeps the described current talker's of output previous talker, only after video camera successfully obtains current talker's video, just export described current talker's video, with respect to prior art, need the first panorama to meeting-place by Picture switch, when video camera successfully obtains talker's after changing video, just Picture switch is arrived to talker after changing, the present invention can reduce video switch number of times really, thereby picture is connected closely, and the video of output is more smooth.And, when in meeting-place, many people alternately talk rapidly, the picture that prior art is specifically designed to the video camera shooting of taking talker's video can comprise a plurality of talkers come in, if described a plurality of talker is distant, cannot in captured picture, observe described a plurality of talkers' expression.In the present invention, due to cooperatively interacting of described the first video camera and described the second video camera, even if there is talker's alternately speech rapidly in meeting-place, the facial picture that two video cameras also can reverse shot talker.In addition, by the current talker of formal output of two pictures and described current talker's previous talker's video, the participant of remote site is except can clearly observing described current talker's face feature, can also observe the reaction that in local meeting-place, a side makes a speech for the opposing party and (be applicable to many people's talks, the situation that particularly two people talk), like this, just retain more valuable conferencing information.

A kind of method of controlling video capture providing with the embodiment of the present invention is corresponding, and the embodiment of the present invention also provides a kind of device of controlling video capture.The device of the control video capture that the embodiment of the present invention provides can be implemented by a class device that possesses control processing capacity, and described device can be for example video camera, Video Controller, video terminal etc.As shown in figure 12, a kind of device 12 of controlling video capture that the embodiment of the present invention provides comprises:

Control unit 121, for when the first talker talks, controls the video that the first camera head is taken the first talker; For when current talker changes to the second talker from described the first talker, control the video that the second camera head is taken the second talker, wherein, described the second talker is the next talker different from described the first talker position; Also for when talker occurring again changing, controlling successively described the first camera head and described the second current talker's of camera head reverse shot video when follow-up.

Processing unit 122, is connected with described control unit 121, for exporting described current talker's video after the video successfully obtaining described current talker.

Wherein, alternatively, in one embodiment, described control unit 121 also can be used for: control before the first camera head takes the first talker's video, when initial condition, control the video that described the first camera head and described the second camera head are taken whole meeting-place;

Described processing unit 122, also for exporting captured video.

Alternatively, in another embodiment, described control unit 121 also for: for described the first camera head and described the second camera head arrange respectively tracking mark, wherein, the tracking mark of described the first camera head is initially the first tracking mark, and the tracking mark of described the second camera head is initially the second tracking mark.

Described control unit 121 specifically for: when the first talker talks, the first camera head that control has the first tracking mark removes to take the first talker's video, after successfully obtaining described the first talker's video, the tracking mark of described the first camera head is set to described the second tracking mark from described the first tracking mark, and the tracking mark of described the second camera head is set to described the first tracking mark from described the second tracking mark simultaneously.

Described control unit 121 specifically for: when current talker changes to the second talker from described the first talker, the second camera head that control has described the first tracking mark removes to take the second talker's video, after successfully obtaining described the second talker's video, the tracking mark of described the second camera head is set to described the second tracking mark from described the first tracking mark, and the tracking mark of described the first camera head is set to described the first tracking mark from described the second tracking mark simultaneously.

Described control unit 121 specifically for: when follow-up each generation talker changes, the camera head that control has described the first tracking mark removes to take current talker's video, after successfully obtaining current talker's video, the tracking mark of described the first camera head and described the second camera head is exchanged.

Alternatively, control unit 121 specifically for: judge that described the second talker position is whether in described the first talker's output picture; If the video that described the second camera head is taken described the second talker, not in described the first talker's output picture, is controlled in described the second talker position;

If described the second talker position in described the first talker's output picture, further judges that described the second talker position is whether in the setting regions of described the first talker's output picture; If the video that described the first camera head is taken described the second talker, in described setting regions, is controlled in described the second talker position; If the second talker described in described the first camera head track up not in described setting regions, is controlled in described the second talker position, so that described the second talker position is in described setting regions.

Alternatively, described control unit 121 can be specifically for: utilize auditory localization technology, control the video that camera head is taken talker.

Further, described control unit 121 can be specifically for: utilize auditory localization technology and in conjunction with presetting bit or image recognition technology, control the video that camera head is taken talker.

It should be noted that, described the first camera head and described the second camera head can be connected and fixed by jockey, also can be separate.

In the present embodiment, when someone starts to talk, control unit 121 is controlled wherein the video that a camera head is taken current talker, and processing unit 122 is after successfully getting current talker's video, by described video output.Now, the next talker of another camera head in current talker described in preparation track up armed state.When follow-up talker changes, control unit 121 can be controlled the next talker's of the described current talker of camera head shooting in described armed state video immediately.Due to the position generation change from talker, to obtaining the process need time of talker's suitable video after changing, the picture that the present embodiment outputs to remote site in the meantime does not need to be first switched to the panorama in meeting-place, but continue to export talker's video before changing, like this, can reduce video switch number of times, thereby picture is connected closely, the video of output is more smooth.And, video due to two current talkers of camera head reverse shot of control unit 121 controls, even if there is talker's alternately speech rapidly in meeting-place, the facial picture that two camera heads also can reverse shot talker, retains valuable conferencing information more.

Alternatively, in another embodiment of the present invention, the video that processing unit 122 can the described current talker of full frame output.Processing unit 122 specifically for: after successfully obtaining described current talker's video, described current talker's video full screen display is set, after accomplishing the setting up, the described current talker's of full frame output video; Before successfully obtaining described current talker's video, the described current talker's of full frame output previous talker's video.

By the described current talker's of full frame output video, the participant of remote site can more clearly observe described current talker's facial feature, these facial features may comprise important conferencing information, like this, can further retain valuable conferencing information.

Alternatively, in another embodiment of the present invention, processing unit 122 can be exported with the form of picture-in-picture described current talker and described current talker's previous talker's video simultaneously.

Processing unit 122 specifically for: after successfully obtaining described current talker's video, the video that described current talker's video and described current talker's previous talker be set shows with the form of picture-in-picture; Wherein, described picture-in-picture comprises the first picture and is included in the second picture in described the first picture, less than described the first picture, in described the first picture, show described current talker, in described the second picture, show described current talker's previous talker; After setting completes, with the form of picture-in-picture, export described current talker and described current talker's previous talker's video simultaneously.

Control unit 121 also for: when current talker changes to the 3rd talker from described the second talker, control described the first camera head and take the 3rd talker's video, wherein, described the 3rd talker is the next talker different from described the second talker position.

Processing unit 122 specifically for: before successfully obtaining described the 3rd talker's video: export described the second talker in described the first picture, export described the first talker's freeze frame in described the second picture; Or, in described the first picture, export described the second talker, in described the second picture, output has started to take but described the 3rd talker in not yet successful acquisition process; After successfully obtaining described the 3rd talker's video: export described the 3rd talker in described the first picture, export described the second talker in described the second picture.

With the form of picture-in-picture, export described current talker and described current talker's previous talker's video simultaneously, make the participant of remote site can clearly observe described current talker's facial feature, can also see the situation of talker's change in local meeting-place and the reaction that a side makes a speech for the opposing party simultaneously, like this, just retained more valuable conferencing information.

Alternatively, in an embodiment more of the present invention, processing unit 122 can be exported with the form of two pictures described current talker and described current talker's previous talker's video simultaneously.

Processing unit 122 specifically for: after successfully obtaining described current talker's video, the video that described current talker's video and described current talker's previous talker be set shows with the form of two pictures; Wherein, the described pair of picture comprises the two part pictures that do not comprise mutually, current talker described in a part of picture disply, current talker's previous talker described in another part picture disply; After setting completes, with the form of two pictures, export described current talker and described current talker's previous talker's video simultaneously.

Processing unit 122 specifically for: before successfully obtaining described the 3rd talker's video: export described the first talker's freeze frame in described a part of picture, export described the second talker in described another part picture; Or in described a part of picture, output has started to take but described the 3rd talker in not yet successful acquisition process, exports described the second talker in described another part picture; After successfully obtaining described the 3rd talker's video: export described the 3rd talker in described a part of picture, export described the second talker in described another part picture.

By the current talker of formal output of two pictures and described current talker's previous talker's video, the participant of remote site is except can clearly observing described current talker's face feature, can also observe the reaction that in local meeting-place, a side makes a speech for the opposing party and (be applicable to many people's talks, the situation that particularly two people talk), like this, just retain more valuable conferencing information.

It should be noted that, in the device embodiment of above-mentioned control video capture, included unit is just divided according to function logic, but be not limited to above-mentioned division, as long as can realize corresponding function; In addition, the concrete title of each functional unit also, just for the ease of mutual differentiation, is not limited to protection scope of the present invention.

Other embodiment that below with reference to Figure 13 A to Figure 13 C, the present invention controlled the device of video capture describe.The device 13 of the control video capture that as shown in FIG. 13A, the embodiment of the present invention provides comprises:

Controller 131, for when the first talker talks, controls the video that the first photographing module 132 is taken the first talker; For when current talker changes to the second talker from described the first talker, control the video that the second photographing module 133 is taken the second talker, wherein, described the second talker is the next talker different from described the first talker position; Also for when talker occurring again changing, controlling successively the first photographing module 132 and the second current talker's of photographing module 133 reverse shot video when follow-up.

Output processor 134, is connected with the second photographing module 133 with the first photographing module 132, for exporting described current talker's video after the video successfully obtaining described current talker.

Described output processor 134 can be integrated in the first photographing module 132 or the second photographing module 133, also can be separated with the second photographing module 133 with the first photographing module 132.

Wherein, alternatively, described controller 131 also can be used for: before controlling first photographing module 132 shooting the first talkers' video, when initial condition, control the video that the first photographing module 132 and the second photographing module 133 are taken whole meeting-place;

Described output processor 134, also for exporting the video in captured whole meeting-place.

The first photographing module 132 and the second photographing module 133 can be separate, also can be connected and fixed by jockey, form a two photographing module.The first photographing module 132 and the second photographing module 133 can be integrated on the device 13 of controlling video capture, also can be separated with the device 13 of controlling video capture.

Alternatively, in one embodiment, described controller 131 also can be used for: for described the first photographing module 132 and described the second photographing module 133 arrange respectively tracking mark, wherein, the tracking mark of described the first photographing module 132 is initially the first tracking mark, and the tracking mark of described the second photographing module 133 is initially the second tracking mark.

Described controller 131 specifically for: when the first talker talks, the first photographing module 132 that control has the first tracking mark removes to take the first talker's video, after successfully obtaining described the first talker's video, the tracking mark of described the first photographing module 132 is set to described the second tracking mark from described the first tracking mark, and the tracking mark of described the second photographing module 133 is set to described the first tracking mark from described the second tracking mark simultaneously.

Described controller 131 specifically for: when current talker changes to the second talker from described the first talker, the second photographing module 133 that control has described the first tracking mark removes to take the second talker's video, after successfully obtaining described the second talker's video, the tracking mark of described the second photographing module 133 is set to described the second tracking mark from described the first tracking mark, and the tracking mark of described the first photographing module 132 is set to described the first tracking mark from described the second tracking mark simultaneously.

Described controller 131 specifically for: when follow-up each generation talker changes, the camera head that control has described the first tracking mark removes to take current talker's video, after successfully obtaining current talker's video, the tracking mark of described the first photographing module 132 and described the second photographing module 133 is exchanged.

As shown in Figure 13 B, alternatively, the device 13 of the control video capture that the embodiment of the present invention provides also comprises:

Pickup microphone array 135, auditory localization device 136, for: utilize auditory localization technology, obtain talker's position, wherein, the sound picking up according to pickup microphone array 135, auditory localization device 136 utilizes auditory localization technology to position.The position that controller 131 obtains according to auditory localization, controls the video that photographing module is taken talker.

As shown in Figure 13 B, further, the device 13 of the control video capture that the embodiment of the present invention provides also comprises: framing device 137, for utilizing the image recognition technologys such as the moving detection of the detection of people's face, Face Detection or lip to position talker; The positional information that controller 131 can be used for obtaining according to image recognition technology, controls the video that photographing module is taken talker.

Alternatively, the position that controller 131 obtains according to auditory localization and presetting bit information, control the video that photographing module is taken talker.

Alternatively, framing device 137 is specifically for judging that described the second talker position is whether in described the first talker's output picture; If described the second talker position is not in described the first talker's output picture, controller 131 is controlled the video that the second photographing module 133 is taken described the second talker;

If described the second talker position is in described the first talker's output picture, framing device 137 further judges that described the second talker position is whether in the setting regions of described the first talker's output picture; If described the second talker position is in described setting regions, controller 131 is controlled the video that the first photographing module 132 is taken described the second talker; If described the second talker position is not in described setting regions, controller 131 is controlled the second talker described in the first photographing module 132 track ups, so that described the second talker position is in described setting regions.

In the present embodiment, when someone starts to talk, controller 131 is controlled wherein the first photographing module 132 current talkers' of shooting video, and output processor 134 gets current talker's video, and exports this video.Now, the next talker of the second photographing module 133 in current talker described in preparation track up armed state.When follow-up talker changes, controller 131 can be controlled the video of taking described current talker's next talker in the second photographing module 133 of described armed state immediately.Due to the position generation change from talker, to obtaining the process need time of talker's suitable video after changing, the picture that the present embodiment outputs to remote site in the meantime does not need to be first switched to the panorama in meeting-place, but continue to export talker's video before changing, like this, can reduce video switch number of times, thereby picture is connected closely, the video of output is more smooth.And, video due to two current talkers of photographing module reverse shot of controller 131 controls, even if there is talker's alternately speech rapidly in meeting-place, the facial picture that two photographing modules also can reverse shot talker, retains valuable conferencing information more.

Alternatively, in another embodiment of the present invention, the video that output processor 134 can the described current talker of full frame output.Output processor 134 specifically for: after successfully obtaining described current talker's video, described current talker's video full screen display is set, after accomplishing the setting up, the described current talker's of full frame output video; Before successfully obtaining described current talker's video, the described current talker's of full frame output previous talker's video.

Alternatively, in another embodiment of the present invention, output processor 134 can be exported with the form of picture-in-picture described current talker and described current talker's previous talker's video simultaneously.

Output processor 134 specifically for: after successfully obtaining described current talker's video, the video that described current talker's video and described current talker's previous talker be set shows with the form of picture-in-picture; Wherein, described picture-in-picture comprises the first picture and is included in the second picture in described the first picture, less than described the first picture, in described the first picture, show described current talker, in described the second picture, show described current talker's previous talker; After setting completes, with the form of picture-in-picture, export described current talker and described current talker's previous talker's video simultaneously.

Controller 131 also for: when current talker changes to the 3rd talker from described the second talker, control the first photographing module 132 and take the 3rd talker's video, wherein, described the 3rd talker is the next talker different from described the second talker position.

Output processor 134 specifically for: before successfully obtaining described the 3rd talker's video: export described the second talker in described the first picture, export described the first talker's freeze frame in described the second picture; Or, in described the first picture, export described the second talker, in described the second picture, output has started to take but described the 3rd talker in not yet successful acquisition process; After successfully obtaining described the 3rd talker's video: export described the 3rd talker in described the first picture, export described the second talker in described the second picture.

With the form of picture-in-picture, export described current talker and described current talker's previous talker's video simultaneously, make the participant of remote site can clearly observe described current talker's facial feature, can also see the situation of talker's change in local meeting-place and the reaction that a side makes a speech for the opposing party simultaneously, like this, just retained further valuable conferencing information.

Alternatively, in an embodiment more of the present invention, output processor 134 can be exported with the form of two pictures described current talker and described current talker's previous talker's video simultaneously.

Output processor 134 specifically for: after successfully obtaining described current talker's video, the video that described current talker's video and described current talker's previous talker be set shows with the form of two pictures; Wherein, the described pair of picture comprises the two part pictures that do not comprise mutually, current talker described in a part of picture disply, current talker's previous talker described in another part picture disply; After setting completes, with the form of two pictures, export described current talker and described current talker's previous talker's video simultaneously.

Controller 131 also for: when current talker changes to the 3rd talker from described the second talker, control described the first photographing module 132 and take the 3rd talker's video, wherein, described the 3rd talker is the next talker different from described the second talker position.

Output processor 134 specifically for: before successfully obtaining described the 3rd talker's video: export described the first talker's freeze frame in described a part of picture, export described the second talker in described another part picture; Or in described a part of picture, output has started to take but described the 3rd talker in not yet successful acquisition process, exports described the second talker in described another part picture; After successfully obtaining described the 3rd talker's video: export described the 3rd talker in described a part of picture, export described the second talker in described another part picture.

By the current talker of formal output of two pictures and described current talker's previous talker's video, the participant of remote site is except can clearly observing described current talker's face feature, can also observe the reaction that in local meeting-place, a side makes a speech for the opposing party, like this, just retain further valuable conferencing information.

The device 13 of the control video capture by a concrete complete embodiment, the embodiment of the present invention being provided below in conjunction with accompanying drawing describes.As shown in Figure 13 C, the device 13 of the control video capture that the embodiment of the present invention provides comprises:

Controller 131; The first photographing module 132, initial tracking mark is made as 0; The second photographing module 133, initial tracking mark is made as 1; Output processor 134; Pickup microphone array 135; Auditory localization device 136; Framing device 137; Main control module 138; Video module 139; Video signal preprocessor 140; Audio-frequency module 141; Audio signal processor 142; Pickup microphone 143; Loud speaker 144; Display 145.Above-mentioned various piece can an integrated complete device, can be also the part being separated from each other, and co-ordination under the control of controller 131 and main control module 138.

After the device 13 of controlling video capture is opened,, when meeting starts, talk in local meeting-place also nobody, and for the deployment scenarios in local meeting-place is sent to remote site, controller 131 can be controlled the panorama that described two photographing modules are taken meeting-place.After photographing module photographs the video in local meeting-place, preferably, the video that utilizes 140 pairs of the second photographing modules 133 of video signal preprocessor in video module 139 to take carries out the processing such as encoding and decoding, and under the control of main control module 138, this video is sent to remote site by network.

While having a people to start to talk in local meeting-place, while there is the first talker, pickup microphone array 135 picks up the sound in local meeting-place, the sound in described local meeting-place is sent to auditory localization device 136, wherein, the sound in described local meeting-place in sending to the process of auditory localization device 136, can through by the internal module (such as the module with preprocessing function) of audio-frequency module 141 to its carry out after denoising etc. processes, then send to auditory localization device 136.The positional information that auditory localization device 136 produces according to auditory localization, controller 131 obtains the positional information that auditory localization device 136 produces, controlling the first photographing module 132(is that tracking mark is 0 photographing module) turn to suitable angle, obtain roughly the first talker's video.Then, the first talker's that framing device 137 obtains according to the first photographing module 132 video, utilizes image recognition technology to determine described the first talker's accurate location (comprising facial positions).Under the control of controller 131, the first photographing module 132(is that tracking mark is 0 photographing module) rotation/push-and-pull camera, take described the first talker's suitable video.The first photographing module 132 is after successfully photographing described the first talker's video, and its tracking mark is set to 0 by 1 by 0 tracking mark of putting 1, the second photographing module 133.

At the first photographing module 132 after successfully photographing described the first talker's video, if talker changes, by described the first talker, change to described the second talker, it is the video that 0 photographing module (i.e. the second photographing module 133) goes to take described the second talker that controller 131 can be controlled described tracking mark, controls the method for taking the same.After the second photographing module 133 photographs described the second talker's suitable video, its tracking mark is set to 0 by 1 by 0 tracking mark that is set to 1, the first photographing module 132.

As described above, when each generation talker changes, it is that 0 photographing module (may be specifically the first photographing module 132 or the second photographing module 133) goes track up talker after changing that controller 131 is all controlled tracking mark, and, after this photographing module is successfully taken talker's suitable video, its tracking mark is set to 1 by 0, and the tracking mark of another photographing module is set to 0 by 1.

After photographing module is successfully taken talker's video, output processor 134 obtains described talker's video from photographing module.After getting described talker's video, output processor 134 can arrange the way of output of video, can export in modes such as full frame, picture-in-picture or two pictures the described talker's who gets video.

Output processor 134, after the way of output that video is set completes, sends to video signal preprocessor 140 by described talker's video, by the processing such as encode of 140 couples of described talkers' of video signal preprocessor video.Then, under the control of main control module 138, from video signal preprocessor 140, start described talker's video to be sent to remote site by network.

Further, successfully obtain current talker's video at photographing module before, main control module 138 can be controlled the output processor 134 described current talkers' of output previous talker's video.

In addition, audio signal processor 142 is for the processing such as encode of the talker's in local meeting-place that pickup microphone 143 is picked up sound, it should be noted that, the purposes of the sound that pickup microphone 143 is picked up is different from the sound that pickup microphone array 135 picks up, the video that the former takes for same photographing module is sent to remote site together, and the latter is for auditory localization.Loud speaker 144 and display 145 are all the basic configuration of controlling the device 13 of video capture, are respectively used to output audio and video in local meeting-place.

Each embodiment in this specification is existing to be stressed to be described, between each embodiment identical similar part mutually referring to, each embodiment stresses is the difference with other embodiment.Especially, for device embodiment, because it is substantially similar in appearance to embodiment of the method, so describe fairly simplely, relevant part is referring to the part explanation of embodiment of the method.

It should be noted that, device embodiment described above is only schematic, the wherein said unit as separating component explanation can or can not be also physically to separate, the parts that show as unit can be or can not be also physical locations, can be positioned at a place, or also can be distributed in a plurality of network element.Can select according to the actual needs some or all of module wherein to realize the object of the present embodiment scheme.In addition, in device embodiment accompanying drawing provided by the invention, the annexation between module represents to have communication connection between them, specifically can be implemented as one or more communication bus or holding wire.Those of ordinary skills, in the situation that not paying creative work, are appreciated that and implement.

The possible implementation that one of ordinary skill in the art will appreciate that various aspects of the present invention or various aspects can be embodied as system, method or computer program.Therefore, the possible implementation of each aspect of the present invention or various aspects can adopt complete hardware implementation example, complete implement software example (comprising firmware, resident software etc.), or the form of the embodiment of integration software and hardware aspect, is all referred to as " circuit ", " module " or " system " here.In addition, the possible implementation of each aspect of the present invention or various aspects can adopt the form of computer program, and computer program refers to the computer readable program code being stored in computer-readable medium.

Computer-readable medium can be computer-readable signal media or computer-readable recording medium.Computer-readable recording medium is including but not limited to electronics, magnetic, optics, electromagnetism, infrared or semiconductor system, equipment or device, or aforesaid appropriately combined arbitrarily, as random access memory (RAM), read-only memory (ROM), Erasable Programmable Read Only Memory EPROM (EPROM or flash memory), optical fiber, portable read-only memory (CD-ROM).

Processor in computer reads the computer readable program code being stored in computer-readable medium, makes processor can carry out the function action of stipulating in the combination of in flow chart each step or each step; Generation is implemented in the device of the function action of stipulating in each piece of block diagram or the combination of each piece.

Computer readable program code can be completely carried out on user's computer, part is carried out on user's computer, as independent software kit, part on user's computer and part on remote computer, or on remote computer or server, carry out completely.Also should be noted that in some alternate embodiment, in flow chart, in each step or block diagram, the dated function of each piece may be not according to occurring in sequence of indicating in figure.For example, depend on related function, in fact two steps or two pieces that illustrate in succession may be executed substantially concurrently, or these pieces sometimes may be carried out with reverse order.

The above; be only the specific embodiment of the present invention, but protection scope of the present invention is not limited to this, is anyly familiar with those skilled in the art in the technical scope that the present invention discloses; can expect easily changing or replacing, within all should being encompassed in protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection range of claim.

Claims

1. a method of controlling video capture, is characterized in that, comprising:

2. method according to claim 1, is characterized in that, the described current talker's of described output video comprises: the described current talker's of full frame output video.

3. method according to claim 2, is characterized in that, the described current talker's of described full frame output video comprises:

4. method according to claim 1, is characterized in that, the described current talker's of described output video comprises: the video of simultaneously exporting described current talker and described current talker's previous talker with the form of picture-in-picture;

Wherein, described picture-in-picture comprises the first picture and is included in the second picture in described the first picture, less than described the first picture, in described the first picture, export described current talker, in described the second picture, export described current talker's previous talker.

5. method according to claim 4, is characterized in that, described method also comprises:

6. method according to claim 1, is characterized in that, the described current talker's of described output video comprises: the video of simultaneously exporting described current talker and described current talker's previous talker with the form of two pictures;

Wherein, described pair of picture comprises the two part pictures that do not comprise mutually, the described current talker of a part of picture output, the described current talker's of another part picture output previous talker.

7. method according to claim 6, described method also comprises:

8. method according to claim 1, is characterized in that, before described control the first camera head is taken the first talker's video, described method also comprises:

9. according to the method described in claim 1-8 any one, it is characterized in that, before described control the first camera head is taken the first talker's video, described method also comprises:

10. method according to claim 9, is characterized in that,

Described when follow-up when talker occurring again changing, the video of controlling successively described the first camera head and described the second current talker of camera head reverse shot comprises: when follow-up each generation talker changes, the camera head that control has described the first tracking mark removes to take current talker's video, after successfully obtaining current talker's video, the tracking mark of described the first camera head and described the second camera head is exchanged.

11. methods according to claim 10, is characterized in that, the video of controlling camera head shooting talker comprises:

12. methods according to claim 11, is characterized in that, the described auditory localization technology of utilizing, and the video of controlling camera head shooting talker comprises:

13. according to the method described in claim 1 to 12 any one, it is characterized in that, described when current talker changes to the second talker from described the first talker, controls the second camera head and takes the second talker's video and comprise:

14. 1 kinds of devices of controlling video capture, is characterized in that, comprising:

15. devices according to claim 14, is characterized in that, described processing unit specifically for:

Described current talker's video full screen display is set;

The described current talker's of full frame output video.

16. devices according to claim 15, is characterized in that, described processing unit specifically for:

17. devices according to claim 14, is characterized in that, described processing unit specifically for:

With the form of picture-in-picture, export described current talker and described current talker's previous talker's video simultaneously.

18. devices according to claim 15, is characterized in that, described control unit also for:

Described processing unit specifically for:

19. devices according to claim 14, is characterized in that, described processing unit specifically for:

20. devices according to claim 19, is characterized in that, described control unit also for:

Described processing unit specifically for:

21. devices according to claim 14, is characterized in that, control before the first camera head takes the first talker's video, described control unit also for:

When initial condition, control the video that described the first camera head and described the second camera head are taken whole meeting-place;

Described processing unit, also for by the video output in the captured whole meeting-place of described control unit.

22. according to the device described in claim 14-21 any one, it is characterized in that, described control unit also for:

23. devices according to claim 22, it is characterized in that, described control unit specifically for: when follow-up each generation talker changes, the camera head that control has described the first tracking mark removes to take current talker's video, after successfully obtaining current talker's video, the tracking mark of described the first camera head and described the second camera head is exchanged.

24. devices according to claim 23, is characterized in that, described control unit specifically for:

25. devices according to claim 24, is characterized in that, described control unit specifically for:

26. according to claim 14 to the device described in 25 any one, it is characterized in that, described control unit specifically for: