CN103595953B

CN103595953B - A kind of method and apparatus for controlling video capture

Info

Publication number: CN103595953B
Application number: CN201310566974.1A
Authority: CN
Inventors: 王静; 刘智辉; 张金亮
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2013-11-14
Filing date: 2013-11-14
Publication date: 2017-06-20
Anticipated expiration: 2033-11-14
Also published as: WO2015070558A1; CN103595953A

Abstract

The present invention provides a kind of method and apparatus for controlling video capture, is related to video image field, can reduce video switching times while the facial picture of talker is retained, and makes the linking of picture closely, and the video of output is more smooth, and methods described includes：In the first speaker, the first camera head of control shoots the video of the first talker；When current speakers are changed to the second talker from first talker, the second camera head of control shoots the video of the second talker, wherein, second talker is the next talkers different from first speaker location；When subsequently there is talker's change again, the video of first camera head and the second camera head reverse shot current speakers is controlled successively；After the video for successfully obtaining the current speakers, the video of the current speakers is exported.The present invention is in video conference.

Description

A kind of method and apparatus for controlling video capture

Technical field

The present invention relates to video image field, more particularly to a kind of method and apparatus for controlling video capture.

Background technology

Generally, in video conference video camera with the complete of all participants of the angle shot of fixed size, fixation Scape picture.When meeting-place than it is larger when, video camera may from teller farther out, shoot come picture cannot determine who speech, The facial expression of teller cannot be seen clearly, the loss of meeting valuable information is thereby resulted in.

In order to avoid because of only photographing panoramic picture and caused by meeting valuable information loss, it is possible to use two video cameras Meeting-place picture is shot simultaneously.Wherein one video camera is used to shoot all the time the panorama in meeting-place, another video camera be used for Track shoots the picture of teller.

When someone alternately talks in meeting-place, because the video camera of track up talker's picture is currently said in successfully acquisition Rotation/push-and-pull camera is needed before the picture of words person, the video photographed during this is unstable, watch uncomfortable, Picture needs first to be switched to the panorama in meeting-place during this.But, this switching can cause the linking of picture not tight, be sent to remote Hold the video in meeting-place not smooth, can give beholder very uncomfortable sensation.

The content of the invention

Embodiments of the invention provide a kind of method and apparatus for controlling video capture, can retain the face of talker While picture, video switching times are reduced, make the linking of picture closely, the video of output is more smooth.

A kind of first aspect, there is provided method of control video capture, including：

In the first speaker, the first camera head of control shoots the video of the first talker；

When current speakers are changed to the second talker from first talker, the second camera head of control shoots the The video of two talkers, wherein, second talker is the next talkers different from first speaker location；

When subsequently there is talker's change again, first camera head and second camera head are controlled successively The video of reverse shot current speakers；

After the video for successfully obtaining the current speakers, the video of the current speakers is exported.

With reference in a first aspect, in the first possible implementation, the video bag of the output current speakers Include：The video of the full frame output current speakers；

With reference to the first possible implementation of first aspect, in second possible implementation of first aspect In, the video of the full frame output current speakers includes：

Before the video of the current speakers is successfully obtained, the previous speech of the full frame output current speakers The video of person；

After the video for successfully obtaining the current speakers, the video of the full frame output current speakers.

With reference in a first aspect, in the third possible implementation of first aspect, the output current speech The video of person includes：Export the previous speech of the current speakers and the current speakers simultaneously in a form of picture-in-picture The video of person；

Wherein, the picture-in-picture includes that the first picture and the first picture described in the ratio being included in first picture are small Second picture, the current speakers are exported in first picture, and the current speech is exported in second picture The previous talker of person.

With reference to the third possible implementation of first aspect, in the 4th kind of possible implementation of first aspect In, methods described also includes：

When current speakers are changed to three talkers from second talker, first camera head is controlled to clap The video of the 3rd talker is taken the photograph, wherein, the 3rd talker is the next speeches different from second speaker location Person；

The previous speech for exporting the current speakers and the current speakers simultaneously in a form of picture-in-picture The video of person includes：

Before the video of the 3rd talker is successfully obtained：Second speech is exported in first picture Person, exports the freeze frame of first talker in second picture；Or, export described in first picture Second talker, output has begun to shoot but the 3rd speech in not yet successful acquisition process in second picture Person；

After the video for successfully obtaining the 3rd talker：The 3rd speech is exported in first picture Person, exports second talker in second picture.

With reference in a first aspect, in the 5th kind of possible implementation of first aspect, the output current speech The video of person includes：Export the previous speech of the current speakers and the current speakers simultaneously in the form of double pictures The video of person；

Wherein, the output picture includes the two part pictures not included mutually, and a part of picture exports the current speech Person, another part picture exports the previous talker of the current speakers.

With reference to the 5th kind of possible implementation of first aspect, in the 6th kind of possible implementation of first aspect In, methods described also includes：

The previous speech for exporting the current speakers and the current speakers simultaneously in the form of double pictures The video of person includes：

Before the video of the 3rd talker is successfully obtained：First speech is exported in a part of picture The freeze frame of person, exports second talker in another part picture；Or, it is defeated in a part of picture Go out to have begun to shoot but the 3rd talker in not yet successful acquisition process, institute is exported in another part picture State the second talker；

After the video for successfully obtaining the 3rd talker：The 3rd speech is exported in a part of picture Person, exports second talker in another part picture.

With reference in a first aspect, in the 7th kind of possible implementation of first aspect, being filled in the shooting of control first Before putting the video for shooting the first talker, methods described also includes：

In original state, first camera head and second camera head is controlled to shoot the video in whole meeting-place And by captured video frequency output.

With reference to first aspect or first aspect the first to the 7th kind of possible any implementation, in first aspect In 8th kind of possible implementation, before the first camera head of the control shoots the video of the first talker, the side Method also includes：

It is that first camera head and second camera head are respectively provided with tracking mark, wherein, described first takes the photograph As the tracking mark of device is initially the first tracking mark, the tracking mark of second camera head is initially the second tracking mark Will；

It is described in the first speaker, control the first camera head shoot the first talker video include： During one speaker, control with the first tracking mark the first camera head go shoot the first talker video, into After work(obtains the video of first talker, the tracking mark of first camera head is set from first tracking mark Second tracking mark is set to, while the tracking mark of second camera head is set to from second tracking mark First tracking mark；

It is described when current speakers are changed to the second talker from first talker, control the second camera head clap The video for taking the photograph the second talker includes：When current speakers are changed to the second talker from first talker, control tool The second camera head for having first tracking mark goes to shoot the video of the second talker, is successfully obtaining second speech After the video of person, the tracking mark of second camera head is set to described second from first tracking mark and follows the trail of mark Will, while the tracking mark of first camera head is set into first tracking mark from second tracking mark.

With reference to the 8th kind of possible implementation of first aspect, in the 9th kind of possible implementation of first aspect In, it is described when subsequently there is talker's change again, control first camera head and second camera head to hand over successively Include for the video for shooting current speakers：During the follow-up talker's change of generation every time, control has first tracking mark Camera head go shoot current speakers video, successfully obtain current speakers video after, will described first image The tracking mark of device and second camera head is exchanged.

With reference to the 9th kind of possible implementation of first aspect, in the tenth kind of possible implementation of first aspect In, the video that control camera head shoots talker includes：

Using auditory localization technology, control camera head shoots the video of talker.

With reference to the tenth kind of possible implementation of first aspect, in a kind of the tenth possible implementation of first aspect In, the utilization auditory localization technology, the video that control camera head shoots talker includes：

Using auditory localization technology and with reference to presetting bit or image recognition technology, control camera head shoots regarding for talker Frequently.

With reference to first aspect or first aspect the first to a kind of the tenth possible any implementation, in first aspect The 12nd kind of possible implementation in, it is described to be changed to the second talker from first talker in current speakers When, the video that the second camera head of control shoots the second talker includes：

Judge second speaker location whether in the output picture of first talker；

If second speaker location is not in the output picture of first talker, second shooting is controlled Device shoots the video of second talker；

If second speaker location determines whether described second in the output picture of first talker Whether speaker location is in the setting regions of the output picture of first talker；

If second speaker location is in the setting regions, first camera head is controlled to shoot described the The video of two talkers；

If second speaker location is not in the setting regions, the first camera head track up is controlled Second talker, so that second speaker location is in the setting regions.

A kind of second aspect, there is provided device of control video capture, including：

Control unit, in the first speaker, the first camera head of control to shoot the video of the first talker；

Described control unit, is additionally operable to when current speakers are changed to the second talker from first talker, control The video that the second camera head shoots the second talker is made, wherein, second talker is and first speaker location Different next talkers；

Described control unit, is additionally operable to, when subsequently there is talker's change again, first camera head be controlled successively With the video of the second camera head reverse shot current speakers；

Processing unit, is connected with described control unit, for defeated after the video for successfully obtaining the current speakers Go out the video of the current speakers.

With reference to second aspect, in the first possible implementation of second aspect, the processing unit specifically for：

The video full screen display of the current speakers is set；

The video of the full frame output current speakers.

With reference to the first possible implementation of second aspect, in second possible implementation of second aspect In, the processing unit specifically for：

Before the video of the current speakers is successfully obtained, the previous speech of the full frame output current speakers The video of person；After the video for successfully obtaining the current speakers, the video of the full frame output current speakers.

With reference to second aspect, in the third possible implementation of second aspect, the processing unit is also specifically used In：

The video of the video of the current speakers and the previous talker of the current speakers is set with picture-in-picture Form shown；

Wherein, the picture-in-picture includes the first picture and is included in first picture, smaller than first picture The second picture, show the current speakers in first picture, shown in second picture and described currently said The previous talker of words person；

Export the previous talker's of the current speakers and the current speakers simultaneously in a form of picture-in-picture Video；

With reference to the third possible implementation of second aspect, in the 4th kind of possible implementation of second aspect In, described control unit is additionally operable to：

The processing unit specifically for：

With reference to second aspect, in the 5th kind of possible implementation of second aspect, the processing unit is also specifically used In：

The video of the video of the current speakers and the previous talker of the current speakers is set with double pictures Form shown；

Wherein, described pair of picture includes the two part pictures not included mutually, and a part of picture shows the current speakers, Another part picture shows the previous talker of the current speakers；

Export the previous talker's of the current speakers and the current speakers simultaneously in the form of double pictures Video.

With reference to the 5th kind of possible implementation of second aspect, in the 6th kind of possible implementation of second aspect In, described control unit is additionally operable to：

The processing unit specifically for：

With reference to second aspect, in the 7th kind of possible implementation of second aspect, described control unit is additionally operable to：

Before the video for controlling the first camera head to shoot the first talker, in original state, control described first Camera head and second camera head shoot the video in whole meeting-place；

The processing unit, is additionally operable to captured video frequency output.

With reference to second aspect or second aspect the first to the 7th kind of possible any implementation, in second aspect In 8th kind of possible implementation, described control unit is additionally operable to：

Described control unit specifically for：In the first speaker, control with the first tracking mark first is taken the photograph As device goes to shoot the video of the first talker, after the video for successfully obtaining first talker, by the described first shooting The tracking mark of device is set to second tracking mark from first tracking mark, while by second camera head Tracking mark be set to first tracking mark from second tracking mark；

Described control unit specifically for：When current speakers are changed to the second talker from first talker, Second camera head of the control with first tracking mark goes to shoot the video of the second talker, is successfully obtaining described the After the video of two talkers, the tracking mark of second camera head is set to described second from first tracking mark Tracking mark, while the tracking mark of first camera head is set into described first from second tracking mark following the trail of Mark.

With reference to the 8th kind of possible implementation of second aspect, in the 9th kind of possible implementation of second aspect In, described control unit specifically for：During the follow-up talker's change of generation every time, control taking the photograph with first tracking mark As device goes to shoot the video of current speakers, after the video for successfully obtaining current speakers, by first camera head Tracking mark with second camera head is exchanged.

With reference to the 9th kind of possible implementation of second aspect, in the tenth kind of possible implementation of second aspect In, described control unit specifically for：

With reference to the tenth kind of possible implementation of second aspect, in a kind of the tenth possible implementation of second aspect In, described control unit specifically for：

With reference to second aspect or second aspect the first to a kind of the tenth possible any implementation, in second aspect The 12nd kind of possible implementation in, described control unit specifically for：

Judge second speaker location whether in the output picture of first talker；

After adopting the above technical scheme, the method for the control video capture provided according to the present invention and control video capture Device, when someone alternately talks in meeting-place, controls first camera head and second camera head alternately to clap successively The video of current speakers is taken the photograph, and exports the video of current speakers, so, even if there are many people in meeting-place rapidly replacing Speech, two camera heads can also shoot the facial picture of multiple talkers, and in technical scheme provided by the present invention In, only after the video that camera head successfully obtains current speakers, the video of current speakers is just exported, relative to existing Having in technology needs first to be switched to the panorama in meeting-place before camera head successfully obtains the video of next talker, the present invention Video switching times can actually be reduced, so that picture linking is tight, the video of output is more smooth.

Brief description of the drawings

For clearer the explanation embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing There is the accompanying drawing to be used needed for technology to be briefly described, it should be apparent that, drawings in the following description are only the present invention Some embodiments, for those of ordinary skill in the art, on the premise of not paying creative work, can also basis These accompanying drawings obtain other accompanying drawings.

Fig. 1 is the flow chart of an embodiment of the method for present invention control video capture;

Fig. 2A be change after speaker location talker before changing output picture setting regions in the case of, clap Take the photograph the schematic diagram of talker after changing；

Fig. 2 B be change after speaker location talker before changing output picture in but not in the setting area of the picture In the case of in domain, the schematic diagram of talker after change is shot；

Fig. 2 C are in the case that speaker location is not in the output picture of talker before changing after change, after shooting change The schematic diagram of talker；

Fig. 3 A are the flow chart of a specific embodiment of the method for present invention control video capture；

Fig. 3 B are another flow chart of a specific embodiment of the method for present invention control video capture；

Fig. 4 is the schematic diagram of a specific embodiment of the method for present invention control video capture；

The effect diagram of output video camera rotation/push-and-pull process when Fig. 5 A are full screen display；

Fig. 5 B do not export the effect diagram of video camera rotation/push-and-pull process when being full screen display；

Fig. 6 is the flow chart of the another specific embodiment of the method for present invention control video capture；

Fig. 7 is the schematic diagram of the another specific embodiment of the method for present invention control video capture；

Fig. 8 A are the effect diagram that video camera rotation/push-and-pull process is exported when being shown with picture-in-picture；

Fig. 8 B are the effect diagram for not exporting video camera rotation/push-and-pull process when being shown with picture-in-picture；

Fig. 9 is the flow chart of the still another embodiment of the method for present invention control video capture；

Figure 10 is the schematic diagram of the still another embodiment of the method for present invention control video capture；

Figure 11 A are the effect diagram that video camera rotation/push-and-pull process is exported when being shown with double pictures；

Figure 11 B do not export the effect diagram of video camera rotation/push-and-pull process when being and being shown with double pictures；

Figure 12 is the structured flowchart of an embodiment of the device of present invention control video capture；

Figure 13 A are the structural representation of another embodiment of the device of present invention control video capture；

Figure 13 B are the structural representation of the another embodiment of the device of present invention control video capture；

Figure 13 C are the structural representation of the another embodiment of the device of present invention control video capture.

Specific embodiment

The technical scheme to the embodiment of the present invention is clearly and completely described below in conjunction with the accompanying drawings, it is clear that described Embodiment is only a part of embodiment of the invention, rather than whole embodiments.Based on the embodiment in the present invention, ability The every other embodiment that domain those of ordinary skill is obtained on the premise of creative work is not made, belongs to the present invention The scope of protection.

Fig. 1 is the flow chart of an embodiment of the method for present invention control video capture.Control provided in an embodiment of the present invention The method of video capture processed can be implemented by the class device for possessing control process function, and described device can be for example Video camera, Video Controller, video terminal etc..As shown in figure 1, the method bag of control video capture provided in an embodiment of the present invention Include：

S11, in the first speaker, the first camera head of control shoots the video of the first talker.

In embodiments of the present invention, two groups of camera heads are set：First camera head and the second camera head are said to shoot The video of words person.Wherein, first camera head can be a photographing module, and second camera head can also be one Individual photographing module.Certainly, within the scope of the invention, first camera head and second camera head can also be respectively It is multiple photographing modules, the concrete application of multiple photographing modules can be similarly obtained according to an application for photographing module.Institute Stating the first camera head and second camera head can be connected and fixed by attachment means, it is also possible to separate. The camera head referred in the embodiment of the present invention can be video camera or other possess the terminal device of camera function.

During the method for control video capture provided in an embodiment of the present invention can apply to video conference, for shooting and defeated Go out the video of talker in local meeting-place, can be also used for for the picture in local meeting-place being sent to remote site, in order to distal end The participant in meeting-place watches the situation in local meeting-place.

After camera head unlatching, when video conference starts, if there is no people to talk in local meeting-place, first can be simultaneously controlled Camera head and the second camera head shoot the panorama in local meeting-place.If the first camera head described in predetermined control shoots meeting-place In first appearance talker, preferably first by the video frequency output captured by the second camera head to remote site.Now, Due to there is no talker to occur, the participant of remote site need to only watch the panorama in local meeting-place.Have when in local meeting-place When talker starts speech, that is, when there is the first talker, the first camera head can be immediately controlled to shoot regarding for the first talker Frequently；The second camera head can be still controlled to shoot the panorama in local meeting-place simultaneously.

In embodiments of the present invention, it is possible to use auditory localization technology determines the position of talker.Only utilize auditory localization Technology cannot may accurately obtain the position of talker due to reasons such as noise jammings, therefore, it is further possible in advance Setting talker possible position residing when being talked in local meeting-place, in the position by auditory localization technical limit spacing talker When, with reference to possible position set in advance（That is presetting bit）The accuracy rate for being judged is higher.Said to more accurately obtain The position of words person, can combine auditory localization technology and image recognition technology.Specifically, in control camera head (including first Camera head and the second camera head) shoot talker video when, can by multiple pickup microphones composition pickup microphone battle array Row, as the first speaker, the sound in local meeting-place are picked up using the pickup microphone array, are located by before audio Reason, sends auditory localization device to.Wherein, the auditory localization device is located in the class device for possessing control process function Possess the module of sound source positioning function, the pickup microphone array by it is more than two, be distributed locally meeting-place diverse location Pickup microphone composition.The auditory localization device is carried out after receiving the sound that the pickup microphone array is picked up to it Localization process, obtains the positional information of the first talker.Controller can generate corresponding camera head control according to positional information Instruction is sent to head, and the first camera head described in cradle head control turns to suitable shooting angle, to obtain described the roughly The video of one talker, wherein, the head is used to receiving and performing the camera head control instruction that the controller sends.So Afterwards, positional information, presetting bit information or the image recognition technology for being obtained with reference to auditory localization（Described image identification technology specifically may be used Think recognition of face, Face datection, the dynamic detection of lip etc.）, the more accurate positional information of first talker is obtained, generate New control instruction is sent to head, controls the first camera head rotation/push-and-pull camera, and described the is obtained as desired The sizeable picture of one talker, can for example make first talker face occupy whole picture 1/2,1/3 or 1/4 etc..

Cause positioning inaccurate because the precision of auditory localization technology is not high or is easily subject to noise jamming, the present invention is implemented Example utilizes auditory localization technology combination presetting bit or image recognition technology, can accurately determine the position of talker, and then control Camera head is shot.It should be noted that can only use auditory localization technology according to actual conditions in the present invention, or make With auditory localization technology combination presetting bit, or use auditory localization technology set presetting bit, auditory localization technology can also be used In combination with presetting bit and image recognition technology.

S12, when current speakers are changed to the second talker from first talker, the second camera head of control is clapped The video of the second talker is taken the photograph, wherein, second talker is the next speeches different from first speaker location Person.

Current speakers refer to the current people for talking in local meeting-place, in step S11, S12, current speakers point It is not first talker and second talker.It should be noted that after speaker location changes and imaging Device is successfully obtained after change before the video of talker, although the camera head not yet successfully obtained and talk after the change The video of person, but, in the process, current speakers have been talkers after the change.

It is similar with the video that the first camera head of the control shoots the first talker, can first according to auditory localization technology Identify that the position of talker is changed, i.e. talker is changed to position and is said different from described first from first talker Second talker of words person, and then control the second camera head rotation/push-and-pull to suitable shooting angle and shoot size. Then, as step S11, with reference to presetting bit or image recognition technology, second shooting is further controlled as desired Device rotation/push-and-pull camera, shoots the sizeable video of the second talker.

If it should be noted that talker simply somewhat moves, such as only moving one, two distances of bodies position, can be with Think that the position of talker is not changed, it is not necessary to switch camera head, and, as long as talker is still within shooting picture In the setting regions in face, such as account in the central area of whole picture 80%, camera head enters also without rotation/push-and-pull camera Line trace.If talker there occurs walking about, as long as talker is still within the setting regions of shooting picture, it is believed that speech Do not change the position of person, it is not necessary to switch camera head, camera head also without rotation/push-and-pull camera carry out with Track.If talker is changed to another talker, but, two talkers are that speech occurs on same position to replace, or Person, the distance of two talkers is close, is in together in the setting regions of a filming apparatus shooting picture, then it is considered that talker Position do not change, it is not necessary to switch camera head, camera head also without rotation/push-and-pull camera carry out with Track（Reference picture 2A, solid line represents shooting picture, and dotted line represents setting regions）.Whether same talker or different speeches Person, if speaker location need not switch camera head, but can be slight in picture is exported but not in setting regions Ground rotation/push-and-pull camera so that the talker after change is in the middle part of picture（Reference picture 2B）.In discussion below, Unless otherwise indicated, the position change of the change of talker or talker refers both to the position of talker and changes, and changes The distance between rear position and shooting picture center have reached the degree for needing to switch camera head, and the degree can be with Set according to actual concrete scene（Reference picture 2C）.

S13, when subsequently there is talker's change again, controls first camera head and the second shooting dress successively Put the video of reverse shot current speakers.

Specifically, when follow-up talker is changed to next speech of second talker from second talker During three talker of person-the, first camera head is controlled to shoot the video of the 3rd talker.If talking again afterwards Person changes, i.e., talker is changed to the talker of next talker-the four of the 3rd talker from the 3rd talker, Second camera head is then controlled to shoot the video of the 4th talker.So repeatedly, it is ensured that first camera head With the video of the second camera head reverse shot current speakers.

For example, if there are first, second, third, four talkers of fourth in local meeting-place, first starts speech, then first controls at first Make the first camera head and shoot first；When talker is changed to second by first, then the second camera head is controlled to shoot second；Talker afterwards When being changed to third by second, then the camera head of secondary control first shoots third again；When talker is changed to fourth by third again, then secondary control again Make the second camera head and shoot fourth, so repeatedly.

When many people rapidly alternately talk in meeting-place, the camera head that prior art is used to shoot talker's video shoots Picture multiple talkers can be included, if the multiple talker is distant, cannot be in captured picture The expression of the multiple talker is observed, causes the valuable information loss of meeting.The present invention it is quite different, the first camera head and Second camera head can follow the trail of talker, wherein, when a camera head follows the trail of current speakers, another camera head is chased after Talker after track change.So, it is ensured that the first camera head and the second camera head cooperate, slitless connection： When first camera head shoots current speakers, next speech of the current speakers is shot using the second camera head Person；When the second camera head shoots current speakers, the next of the current speakers is shot using the first camera head Talker.Especially when there was only two talkers of first, second in local meeting-place, the first camera head can keep track up First, the second camera head can keep track up second, if talker alternately talks, because the first camera head and second is taken the photograph As device has all adjusted focal length respectively, thus eliminate the process of rotation/push-and-pull camera.So, even if in meeting-place In there is talker and rapidly alternately talk, two camera heads are also capable of the facial picture of reverse shot talker, more protect The valuable information of meeting, and the efficiency of video frequency tracking is stayed also to be improved.

S14, after the video for successfully obtaining the current speakers, exports the video of the current speakers.

Specifically, the camera head for shooting the current speakers is successfully getting the video of the current speakers Afterwards, the video of the current speakers is exported, the video of the output current speakers is included in the camera head Display screen or local meeting-place display screen in a different manner（I.e. full frame, picture-in-picture, double pictures etc.）Exported, also wrapped Include output in a different manner to remote site.It should be noted that the present invention is for captured video in local meeting-place By which kind of mode（For example encode, decode etc.）Remote site is sent to not limit.During remote site is sent to, For example the video of the current speakers can be sent to video signal preprocessor, video signal preprocessor receives described current After the video of talker, the treatment such as compression coding is carried out, the code stream that then will be obtained after the compression coding is passed by network It is sent to remote site；After remote site receives the code stream, carry out the treatment such as decoding, obtain regarding for the current speakers Frequently, then can be shown on the display screen of remote site in a different manner.So, the participant of remote site can To watch the picture in local meeting-place on the display screen.

When talker changes, the process that camera head obtains talker's video after changing needs the regular hour. In the meantime, picture first can be switched to prior art the panorama in meeting-place, the talker after camera head successfully obtains change Video when, just by picture be switched to change after talker, can so cause video not smooth.In embodiments of the present invention, Before the video that step S14 successfully obtains the current speakers, the side of control video capture provided in an embodiment of the present invention Method may also include：Export the video of the previous talker of the current speakers.That is, the current speakers are successfully being obtained Video before, export the video of the previous talker of the current speakers；Successfully obtaining the current speakers' After video, the video of the current speakers is exported.So, in full frame output picture, can not only ensure to export picture Continuously, but also can ensure that output image quality is higher, it is to avoid camera head is obtaining the video of the current speakers During, cause the picture of output the phenomenons such as obscure, rock occur because of camera head rotation/push-and-pull camera.

Certainly, in embodiments of the present invention, when the picture in the local meeting-place is exported, not only can with full frame output, and And can also be exported in forms such as picture-in-picture, double pictures.When being exported in the form of picture-in-picture, successfully obtaining described After the video of current speakers, can in the big picture (the first picture) the output current speakers, and in small picture (the Two pictures) the middle previous talker for exporting the current speakers.When being exported using double visual formats, institute is successfully being obtained State after the video of current speakers, described working as can be exported in a portion picture of two parts picture not included mutually Preceding talker, and the previous talker of the current speakers is exported in another part picture.On these output forms Specific implementation will be below specific embodiment in introduce respectively.

Further, in embodiments of the present invention, for the ease of control two camera heads shoot in turn current speakers and The video of the current speakers is exported, two camera heads can be respectively before starting to shoot and tracking mark is set, for example Initial tracking mark can be respectively provided with for the first tracking mark for first camera head and second camera head With the second tracking mark, the tracking mark can be represented using 0 or 1 grade numeral.Tracking mark can be set for first chases after The camera head of track mark sets the shooting that tracking mark is the second tracking mark dedicated for the video of shooting current speakers Device is dedicated for shooting next talker of the current speakers（Or previous talker）Video.And, into After work(obtains the video of the current speakers, the tracking mark of first camera head and second camera head needs Exchange.

In the case where tracking mark is set for the first camera head and the second camera head, step S11 is in the first speech When person talks, the video that the first camera head of control shoots the first talker may include：In the first speaker, control tool The first camera head for having the first tracking mark goes to shoot the video of the first talker, is successfully obtaining first talker's After video, the tracking mark of first camera head is set to second tracking mark from first tracking mark, The tracking mark of second camera head is set to first tracking mark from second tracking mark simultaneously.

When current speakers are changed to the second talker from first talker, the shooting of control second is filled step S12 The video for putting the second talker of shooting may include：When current speakers are changed to the second talker from first talker, Second camera head of the control with first tracking mark goes to shoot the video of the second talker, is successfully obtaining described the After the video of two talkers, the tracking mark of second camera head is set to described second from first tracking mark Tracking mark, while the tracking mark of first camera head is set into described first from second tracking mark chasing after Track mark.

Step S13 controls first camera head and described second to take the photograph successively when subsequently there is talker's change again As the video of device reverse shot current speakers may include：Follow-up when talker occurring every time changing, control has described the The camera head of one tracking mark goes to shoot the video of current speakers, after the video for successfully obtaining current speakers, by institute The tracking mark for stating the first camera head and second camera head is exchanged.In such manner, it is possible to ensure two camera head phase interworkings Conjunction, slitless connection, the video of current speakers described in reverse shot.

In the embodiment of the present invention, the first camera head and the second camera head can follow the trail of talker.In the first speech When person talks, first camera head is controlled to shoot first talker, at the same time, second camera head is just located In the armed state for preparing next talker of the first talker described in track up.Said by described first in current speakers Words person is changed to the second talker（Next talkers i.e. different from first speaker location）When, then control described Two camera heads shoot second talker, and at the same time, first camera head keeps shooting first talker, And it is changed into the state for preparing the track up next talker different from second speaker location.So, Ke Yibao Demonstrate,prove the first camera head and the second camera head can cooperate, slitless connection.Due to when talker changes, imaging The process that device successfully obtains the video of the talker after change needs the regular hour.In the meantime, prior art is due to adopting With a camera head dedicated for shooting the panorama in local meeting-place, another camera head dedicated for track up talker, because This, it is necessary to first before the video that the camera head dedicated for track up talker successfully obtains current speakers Picture is switched to the panorama in meeting-place, when camera head successfully obtains the video of current speakers, picture is just switched to change Talker after more, can so cause video not smooth.And in technical scheme provided by the present invention, only camera head into Work(is obtained after the video of current speakers, just exports the video of the current speakers, is successfully obtained currently in camera head Before the video of talker, the video of the previous talker of the output current speakers is kept.So, relative to existing skill Art needs first to be switched to the panorama in local meeting-place before camera head successfully obtains the video of next talker, and the present invention is really Video switching times can be reduced in fact, so that picture linking is tight, the video of output is more smooth.And, when local meeting-place In many people rapidly alternately talk when, according to prior art shoot picture multiple talkers can be included, if described many Individual talker is distant, then the expression of the multiple talker cannot be observed in captured picture.In the present invention, by In the mutual cooperation of first camera head and second camera head, even if it is fast to there is talker in local meeting-place Alternately talk fastly, two camera heads are also capable of the facial picture of reverse shot talker.

To more fully understand the present invention, referring to Fig. 3 A to Figure 10, then come to this hair by taking several specific embodiments as an example It is bright to be further described.It is also noted that, it is set forth below for embodiment be a part of embodiment of the invention, this area skill Art personnel can be easy to expect other embodiment that they are within by content of the present invention.

In following specific embodiment, it is possible to use tracking mark is marked to camera head, and export specified chasing after Video captured by the camera head of track mark.For example, the initial tracking mark of the first camera head can be set into 0 （That is the first tracking mark）, the initial tracking mark of the second camera head is set to 1（That is the second tracking mark）, wherein, chase after The camera head that track is masked as 0 is used to shoot the video of current speakers；Tracking mark is that 1 camera head is used to shoot current The video of next talker of talker, illustrates infra for simplicity as example.Certainly, by the first camera head Tracking mark be set to 1, the tracking mark of the second camera head is set to 0, or other set the mode of tracking marks It is possible, this is not limited by the present invention.

Fig. 3 A are the flow charts of a specific embodiment of the method for present invention control video capture.Fig. 3 B are controlled for the present invention Another flow chart of one specific embodiment of the method for video capture.

As shown in Figure 3A, so that camera head is video camera as an example, the control video capture that the specific embodiment of the invention is provided Method include：

S31, when meeting starts, two video cameras of control shoot the panorama in local meeting-place.

In two video cameras（First video camera and the second video camera）After unlatching, i.e., when meeting starts, local meeting Field also there is no people to talk, and in order to the deployment scenarios in local meeting-place are sent into remote site, can control two video cameras The panorama in local meeting-place is shot, the angle of shooting and big I are set by user, it can include to own to be preferable to provide Participant and the setting of main conference scenario.When the picture that video camera shoots is sent into remote site from local meeting-place, due to What now two video cameras shot is the panorama in local meeting-place, thus can transmit the picture that any one video camera shoots, Preferably first transmission tracking mark is 1 video camera（That is the second video camera）The picture of shooting.

S32, using auditory localization technology, the first video camera of control shoots the video of first talker.

After the panorama for controlling two video cameras to shoot meeting-place, when there is a people to start speech in meeting-place, i.e., When there is the first talker, pickup microphone array picks up the sound in local meeting-place, and the sound is sent into auditory localization Device, the auditory localization device produces speaker location's information according to auditory localization technology.Then, controller is believed according to the position Breath controls the video camera that tracking mark is 0 to shoot the sizeable video of the first talker.The tracking mark is 0 to take the photograph Camera（That is the first video camera）After photographing the sizeable video of the first talker, its tracking mark is set to 1, another Platform video camera（That is the second video camera）Tracking mark be set to 0 by 1.

S33, when current speakers are changed to the second talker from first talker, controls second video camera The video of second talker is shot, wherein, second talker is different from first speaker location next Individual talker.

After first video camera photographs the sizeable video of the first talker, first video camera Tracking mark become for 1, the tracking mark of second video camera becomes for 0.Afterwards, if the position of talker becomes Change, i.e., second talkers different from first speaker location are changed to by first talker, controller can With the video camera for controlling the tracking mark to be 0（I.e. described second video camera）Remove to shoot the video of second talker, control Make the same S32 of method for shooting.When the video camera that the tracking mark is 0 photographs the sizeable video of the second talker Afterwards, its tracking mark is set to 1, and another tracking mark of video camera is then set to 0 by 1.

S34, when subsequently there is talker's change again, controls first video camera and second video camera to hand over successively For the video for shooting current speakers.

After second video camera photographs the sizeable video of the second talker, second video camera Tracking mark become for 1, the tracking mark of first video camera becomes for 0.Afterwards, if talker is again by described second Talker is changed to the 3rd talker（Next talker of i.e. described second talker）, then it is 0 to take the photograph to control tracking mark Camera（I.e. described first video camera）Go to shoot the 3rd talker, when the video camera that the tracking mark is 0 successfully obtains described After the video of the 3rd talker, the tracking mark is that the tracking mark of 0 video camera is set to 1 by 0, another video camera Tracking mark is set to 0 by 1.Similarly, when talker is changed to the 4th talker by the 3rd talker（3rd speech Next talker of person）When, then control the video camera that tracking mark is 0（I.e. described second video camera）Go to shoot the described 4th Talker, after the video camera that the tracking mark is 0 successfully obtains the video of the 4th talker, the tracking mark For the tracking mark of 0 video camera is set to 1 by 0, the tracking mark of another video camera is set to 0 by 1.So, say every time When words person changes, the video camera that tracking mark is 0 is controlled（May be specifically the first video camera or the second video camera）Go tracking The talker after change is shot, and, the video camera is successfully obtained after the video of talker, and its tracking mark is put by 0 It is 1, another tracking mark of video camera is then set to 0 by 1.

S35, it is full frame after the video that the video camera for shooting current speakers' video successfully obtains current speakers Export the video of the current speakers.

After the video that 0 video camera successfully obtains current speakers is designated, the tracking mark is 0 video camera Tracking mark be set to 1 by 0, another tracking mark of video camera is then set to 0 by 1.So, tracking mark is 1 after change The video that video camera shoots is the video of the current speakers.Here, the full frame output current speakers' regards Frequency refers to that the video of output comes from a video camera.A talker can be only shown in the picture of full screen display, it is also possible to aobvious Show multiple talkers.Wherein, multiple talkers' is closer to the distance so that according to shoot come video be able to observe that each is said The body language or facial information of words person.With reference to step S12, if multiple talkers are distant so that cannot be in same shooting The video that machine shoots observes each talker, then it is considered that the position of talker there occurs change, it is possible to use Ling Yitai Video camera shoots the video of talker.It is sent to after remote site in full frame form in the video of the current speakers, The participant of remote site can clearly be observed that the close up view of the current speakers, wherein the close up view may Comprising important conferencing information, so, can as much as possible retain important conferencing information.

As shown in figure 4, in three width figures from left to right, when the first width figure represents that meeting starts, display full screen display this The panorama in ground meeting-place；Second width figure is represented, after the first talker occurs, display is displayed in full screen the video of the first talker；The Three width figures are represented, when talker is changed to after the second talker by the first talker, display is displayed in full screen the second talker.

S36, before the video camera for shooting current speakers' video successfully obtains the video of the current speakers, Export the video of the previous talker of the current speakers.

It should be noted that step S36 was performed before step S35.

Due to it there is change since talker, the mistake before the video of the current speakers is successfully obtained to video camera Cheng Zhong, video camera can rotate/push-and-pull camera, thus can produce fuzzy or unstable picture.But, in above process, By exporting the video of the previous talker of the current speakers, the output fuzzy or unstable picture can be avoided Face.

For ease of understanding, illustrated below against accompanying drawing 5A and 5B.As shown in Figure 5A, according to order from left to right, Arrange three width figures and be respectively the first width figure, the second width figure, the 3rd width figure.Under 3rd width figure talker the first width figure talker One talker, since talker occur change successfully obtained to video camera the 3rd sizeable video of width figure talker it In preceding process, if the picture that directly output video camera shoots during rotation/push-and-pull camera, just occurs the second width Obscured or unstable picture in figure.Correspondingly, in above process, output is that the first width figure is said to the specific embodiment of the invention The video of words person, and only after the sizeable video for successfully obtaining the 3rd width figure talker, just export the 3rd width Scheme the video of talker, can so avoid the output fuzzy or unstable picture (reference picture 5B).

In addition, according to the situation in local meeting-place, this specific embodiment is likely to occur following several feelings during realization Condition, corresponding processing mode is as follows：

（1）, the unmanned speech in local meeting-place

Do not switch the picture of output, still export the panorama in local meeting-place；

（2）, local meeting-place single people speech, nobody chips in

The picture of output is the full screen display picture of current speakers；

（3）, local meeting-place single people speech, someone chips in, but the time of chipping in is very short

Do not switch the picture of output, still export the picture of main teller's full screen display；

（4）, local meeting-place single people speech, when have movement

If talker's walks about, the skew of head or body exports picture and positioned at the setting of the picture without departing from current In central area, then video camera does not switch, and does not also track, and the picture of output is complete in central area current speakers Screen display picture；If the movement of talker causes talker still without departing from current output picture it is likely that or having exceeded this The setting central area of picture, then video camera does not switch, but can do appropriate tracking, to keep talker to be located at central area It is interior；If the movement of talker causes talker beyond current output picture, switch video camera, talker is carried out Tracking；

（5）, local meeting-place teller occur once to change, be altered to bystander or other people

If the speaker location after change is without departing from output picture before changing and positioned at the setting central area of the picture Interior, then video camera does not switch, and does not also track, and it is full frame aobvious in central area that the picture of output is that talker after change is located at Show picture；If after change the position of talker still without departing from output picture before changing it is likely that or having exceeded the picture Setting central area, then video camera does not switch, but can do appropriate tracking, with keep change after talker be located at center In region, the picture of output is the full screen display picture that the talker after change is located in central area；If the speech after change Person position beyond output picture before changing, then switches video camera, and the talker after change is tracked；

（6）, the local many people in meeting-place simultaneously talk, that is, rob speech phase

In this case the time for robbing words is generally very short, and the picture of output is not switched；

（7）, many people in local meeting-place discuss, alternately talk, i.e., repeatedly there is the change of teller position

Video camera alternately tracks the teller after each position is changed, and the picture of output is the complete of talker after changing Screen display picture.

In this specific embodiment, when the position that talker occurs every time is changed, the video camera for controlling tracking mark to be 0 goes Talker after the change of track up position, and, after the appropriate video that the video camera successfully obtains talker, it is chased after Track mark is set to 1 by 0, and another tracking mark of video camera is then set to 0 by 1.Thus can always ensure sometime, There is a video camera to shoot current speakers, while an also other video camera can be used for shooting the current speech Next talker of person.That is, two video cameras can cooperate, slitless connection.Due in the position of talker When changing, the process that video camera successfully obtains the video of the talker after change needs the regular hour.In the meantime, protect The video of the previous talker for exporting the current speakers is held, the video of current speakers is only successfully obtained in video camera Afterwards, the video of the current speakers is just exported, needs that picture is first switched to the panorama in meeting-place relative to prior art, treated When video camera successfully obtains the video of the talker after change, picture is just switched to the talker after change, the present invention is really Video switching times can be reduced, so that picture linking is tight, the video of output is more smooth.And, as many people in meeting-place When rapidly alternately talking, prior art can talk multiple dedicated for shooting the picture that the video camera of talker's video shoots Person is included, if the multiple talker is distant, the multiple talker cannot be observed in captured picture Expression.In the present invention, due to the mutual cooperation of first video camera and second video camera, even if being deposited in meeting-place Rapidly alternately talked in talker, two video cameras are also capable of the facial picture of reverse shot talker.Additionally, by full frame defeated Go out the video of the current speakers, the participant of remote site can clearly observe the face of the current speakers Portion's feature, these facial features may include important conferencing information, so, can more retain valuable meeting letter Breath.

Fig. 6 is the flow chart of the another specific embodiment of the method for present invention control video capture.

As shown in fig. 6, so that camera head is video camera as an example, the control video capture that the specific embodiment of the invention is provided Method includes：

S61, when meeting starts, two video cameras of control shoot the panorama in local meeting-place.

After two video cameras unlatching, i.e., when meeting starts, local meeting-place also nobody's speech, in order to by locally The deployment scenarios in meeting-place are sent to remote site, and two video cameras can be controlled to shoot the panorama in local meeting-place, shooting Angle and big I are set by user, be preferable to provide can be being capable of setting comprising all participants and main conference scenario Put, and, when the panoramic video in local meeting-place is exported, preferably first output tracking mark is 1 shot by camera Video.

S62, with reference to auditory localization technology and presetting bit, the first video camera of control shoots the video of first talker.

After the panorama for controlling two video cameras to shoot meeting-place, when there is a people to start speech in meeting-place, i.e., When there is the first talker, using the positional information of the talker of auditory localization technical limit spacing first.In conjunction with presetting bit, that is, combine Set in advance, talker possible position residing when being talked in local meeting-place, determines the accurate position of first talker Put.Specifically, the immediate presetting bit in the position obtained with auditory localization can be found out from multiple presetting bits as accurate position Put.Then, according to the accurate location of first talker, the video camera for controlling tracking mark to be 0 goes shooting first to controller The video of talker.The tracking mark be 0 video camera photograph the appropriate video of first talker after, its tracking Mark is set to 1, and the tracking mark of another video camera is set to 0 by 1.

S63, when current speakers are changed to the second talker from first talker, controls second video camera The video of second talker is shot, wherein, second talker is different from first speaker location next Individual talker.

After first video camera successfully photographs the video of first talker, first video camera is chased after Track mark becomes for 1, and the tracking mark of second video camera becomes for 0.Now, if talker changes, i.e., by institute State the first talker and be changed to second talkers different from first speaker location, as step S62, control Device processed can control the video camera that the tracking mark is 0（I.e. described second video camera）Go to shoot regarding for second talker Frequently.After the video camera that the tracking mark is 0 successfully photographs the video of second talker, its tracking mark is set to 1, the tracking mark of another video camera is set to 0 by 1.

S64, when subsequently there is talker's change again, controls first video camera and second video camera to hand over successively For the video for shooting current speakers.

After second video camera successfully photographs the video of second talker, second video camera is chased after Track mark becomes for 1, and the tracking mark of first video camera becomes for 0.If talker is become by second talker again More the 3rd talker, then control the video camera that tracking mark is 0（I.e. described first video camera）Go to shoot the 3rd talker, when The tracking mark be 0 video camera successfully obtain the appropriate video of the 3rd talker after, the tracking mark is 0 The tracking mark of video camera is set to 1, another video camera by 0（I.e. described second video camera）Tracking mark be set to 0 by 1.Class As, when talker is changed to the 4th talker by the 3rd talker（Next speech of i.e. described 3rd talker Person）, then control the video camera that tracking mark is 0（I.e. described second video camera）Go to shoot the 4th talker, mark is followed the trail of when described Will be 0 video camera successfully obtain the appropriate video of the 4th talker after, the tracking mark is chasing after for 0 video camera Track mark is set to 1, another video camera by 0（I.e. described first video camera）Tracking mark be set to 0 by 1.When subsequently occurring again When talker changes, reverse shot is done in the same fashion.

S65, after the video that the video camera for shooting current speakers' video successfully obtains current speakers, to draw The form of middle picture exports the video of the previous talker of the current speakers and the current speakers simultaneously；Wherein, institute Picture-in-picture is stated including the first picture and the second in first picture, smaller than the first picture picture is included in, described the The current speakers are exported in one picture, the previous talker of the current speakers is exported in second picture.

After the video that 0 video camera successfully obtains current speakers is designated, the tracking mark is 0 video camera Tracking mark be set to 1 by 0.Now, tracking mark be 1 video camera shoot be the current speakers video, follow the trail of Be masked as 0 video camera shoot be the current speakers previous talker video.Here, described with picture-in-picture Form exports the video of the previous talker of the current speakers and the current speakers simultaneously, refers to described first The current speakers are exported in picture, in first picture and second picture smaller than the first picture is contained in, Export the previous talker of the current speakers.So, the participant of remote site is except it is observed that described current Outside the facial expression of talker, it can also be observed that a side is for the reaction expression that the opposing party makes a speech, these expressions may be wrapped Contain important conferencing information, so, can as much as possible retain important conferencing information.

As shown in fig. 7, in three width figures from left to right, it is defeated in a form of picture-in-picture when the first width figure represents that meeting starts Go out the panorama in local meeting-place；Second width figure is represented, after the first talker occurs, in big picture（That is the first picture）Middle output first Talker, the lower right corner of screen（That is the second picture）Export local meeting-place panorama；3rd width figure is represented, when talker is said by first Words person is changed to after the second talker, and the second talker is exported in big picture, and the lower right corner of screen exports the first talker.

S66, before the video camera for shooting current speakers' video successfully obtains the video of the current speakers, Export the first two talker of the current speakers respectively in first picture and the second picture.

It should be noted that step S66 was performed before step S65.

During being altered to the video that video camera successfully obtains the current speakers from talker, shooting Chance rotation/push-and-pull camera, so as to produce fuzzy or unstable picture.Therefore, can be in first picture and second The first two talker of the current speakers is exported in picture respectively, can so avoid output described fuzzy or unstable Picture.

For ease of understanding, illustrated below against accompanying drawing 8A and 8B.As shown in Figure 8 A, according to order from left to right, Arrange three width figures and be respectively the first width figure, the second width figure, the 3rd width figure.The first width figure lower right corner（That is the second picture）Talker is The first big picture of width figure（That is the first picture）The previous talker of talker, the big picture talker of the first width figure is the 3rd width figure The previous talker of big picture talker.Now, talker to be changed to the 3rd width figure by the big picture talker of the first width figure big Picture talker.Before the video for successfully obtaining the big picture talker of the 3rd width figure to video camera talker occurs change During, if the picture that directly output video camera shoots during rotation/push-and-pull camera, just occurs the second width figure Obscured or unstable picture in the picture of the lower right corner.As shown in Figure 8 B, correspondingly, the specific embodiment of the invention is in said process In, output be the first width figure talker moving frame（The second big picture of width figure）With previous the saying of the first width figure talker The freeze frame of words person（Second width figure lower right corner picture）, the output fuzzy or unstable picture can be avoided.

Certainly, according to actual needs, it is altered to video camera from talker and successfully obtains regarding for the current speakers During frequency, it would however also be possible to employ the way of output shown in the second width figure of Fig. 8 A.

（1）, the unmanned speech in local meeting-place

The combination for exporting picture is constant, still exports the panorama in local meeting-place；

（2）, local meeting-place single people speech, nobody chips in

Current speakers are exported in first picture, what the second picture was exported is the previous speech of the current speakers Person, picture composition mode is constant；

Speaker is exported in first picture, the second picture does not switch or export the people that chips in, preferably described second picture Do not switch；

（4）, local meeting-place single people speech, when have movement

If talker's walks about, the skew of head or body is without departing from current the first picture for exporting and positioned at the first picture Setting central area in, then video camera does not switch, and does not also track, the first picture output be that current speakers have action Picture, the second picture is constant, and output picture composition mode is constant；If the movement of talker causes that talker is still defeated without departing from current The first picture for going out it is likely that or exceeded the setting central area of the first picture, then video camera does not switch, but can do Appropriate tracking, to keep talker to be located in the setting central area of the first picture, the second picture is constant, exports picture composition Mode is constant；If the movement of talker causes talker beyond the first picture of current output, switch video camera, it is right Talker is tracked, and exports talker after tracking successfully in the first picture, and the first picture before camera switching is switched to Second picture is exported；

If the speaker location after change is without departing from the first picture before changing and positioned at the setting center of the first picture In domain, then video camera does not switch, and does not also track, the first picture output be change after talker be located at central area in Picture, the second picture is constant；If change after talker position still without departing from the first picture before changing it is likely that or Beyond the setting central area of the first picture, then video camera does not switch, but can do appropriate tracking, to keep saying after change Words person is located in the first picture central area, and the second picture is constant；If the speaker location after change is beyond before changing The first picture, then switch video camera, the talker after change is tracked, the first picture output change after talker, Second picture output talker before changing；

In this case the time for robbing words is generally very short, and the combination for exporting picture is constant；

Video camera alternately tracks the teller after each position is changed, and changes the combination of output picture, i.e., often After secondary change, current speakers are exported in first picture, the output of the second picture is the previous of the current speakers Talker.

In this specific embodiment, when the position that talker occurs every time is changed, the video camera that tracking mark is 0 is controlled Go track up position change after talker, and, the video camera successfully obtain the sizeable video of talker it Afterwards, its tracking mark is set to 1 by 0, and another tracking mark of video camera is then set to 0 by 1.Thus can always ensure at certain At the individual moment, there is a video camera to shoot current speakers, while an also other video camera is in idle condition, can be with Next talker for shooting the current speakers.That is, two video cameras can cooperate, it is seamless right Connect.Because when the position of talker is changed, video camera successfully obtains the process needs of the video of the talker after change Regular hour.In the meantime, keep the video of the previous talker of the output current speakers, only video camera into Work(is obtained after the video of current speakers, just exports the video of the current speakers, needs first to relative to prior art Picture is switched to the panorama in meeting-place, during the video of the talker after video camera successfully obtains change, picture just is switched into change Talker after more, the present invention can actually reduce video switching times, so that picture linking is tight, the video of output is more It is smooth.And, when in meeting-place many people rapidly alternately talk when, prior art dedicated for shoot talker's video video camera Can be included for multiple talkers by the picture of shooting, if the multiple talker is distant, cannot be in captured picture The expression of the multiple talker is observed in face.In the present invention, due to first video camera and second video camera Cooperate, even if there is talker in meeting-place rapidly alternately talking, two video cameras are also capable of reverse shot talker's Facial picture.Additionally, exporting the current speakers and the previous of the current speakers is said simultaneously in a form of picture-in-picture The video of words person so that the participant of remote site can clearly be observed that the facial feature of the current speakers, while It can also be seen that the situation of talker's change and a side are so, just more for the reaction that the opposing party makes a speech in local meeting-place Remain valuable conferencing information.

Fig. 9 is the flow chart of the still another embodiment of the method for present invention control video capture.

As shown in figure 9, so that camera head is video camera as an example, the control video capture that the specific embodiment of the invention is provided Method includes：

S91, when meeting starts, two video cameras of control shoot the panorama in meeting-place.

After two video cameras unlatching, i.e., when meeting starts, local meeting-place also nobody's speech, in order to by locally The deployment scenarios in meeting-place are sent to remote site, and two video cameras can be controlled to shoot the panorama in local meeting-place, shooting Angle and big I are set by user, be preferable to provide can be being capable of setting comprising all participants and main conference scenario Put, when the video of panorama in local meeting-place is exported, preferably first output tracking mark is 1 shot by camera Video.

S92, using auditory localization technology and image recognition technology, the first video camera of control shoots first talker's Video.

After the panorama for controlling two video cameras to shoot meeting-place, when there is a people to start speech in meeting-place, i.e., When there is the first talker, using the position of the talker of auditory localization technical limit spacing first, the video camera that tracking mark is 0 is controlled Turn to suitable angle.Image recognition technology is recycled, the accurate location of first talker is determined whether.Then, According to the accurate location of first talker, the video camera for controlling tracking mark to be 0 goes to shoot the first talker controller Video.The tracking mark be 0 video camera photograph the appropriate video of first talker after, its tracking mark is set to 1, the tracking mark of another video camera is set to 0 by 1.

S93, when current speakers are changed to the second talker from first talker, controls second video camera The video of second talker is shot, wherein, second talker is different from first speaker location next Individual talker.

After first video camera successfully photographs the video of first talker, first video camera is chased after Track mark becomes for 1, and the tracking mark of second video camera becomes for 0.Now, if talker changes, i.e., by institute State the first talker and be changed to second talkers different from first speaker location, as step S92, control Device processed can control the video camera that the tracking mark is 0（I.e. described second video camera）Go to shoot regarding for second talker Frequently.After the video camera that the tracking mark is 0 photographs the appropriate video of second talker, its tracking mark is set to 1, the tracking mark of another video camera is set to 0 by 1.

S94, when subsequently there is talker's change again, controls first video camera and second video camera to hand over successively For the video for shooting current speakers.

S95, after the video that the video camera for shooting current speakers' video successfully obtains current speakers, with double The form of picture exports the video of the previous talker of the current speakers and the current speakers simultaneously；Wherein, institute Stating double pictures includes the two part pictures not included mutually, and a part of picture exports the current speakers, and another part picture is defeated Go out the previous talker of the current speakers.

After the video that 0 video camera successfully obtains current speakers is designated, the tracking mark is 0 video camera Tracking mark be set to 1 by 0.Now, tracking mark be 1 video camera shoot be the current speakers video, follow the trail of Be masked as 0 video camera shoot be the current speakers previous talker video.Here, described with double pictures Form exports the video of the previous talker of the current speakers and the current speakers simultaneously, refers in a picture The middle output current speakers, export the previous talker of the current speakers, above-mentioned two in another picture Picture does not include mutually.So, the participant of remote site except it is observed that the current speakers facial expression in addition to, It can also be observed that a side is for the reaction expression that the opposing party makes a speech, these expressions may include important conferencing information, this Sample, can as much as possible retain important conferencing information.

As shown in Figure 10, it is defeated in the form of double pictures when the first width figure represents that meeting starts in three width figures from left to right Go out the panorama in local meeting-place；Second width figure represents, after the first talker occurs, the first talker is exported in the picture of left side, right Side picture exports local meeting-place panorama；3rd width figure represents that talker is changed to after the second talker by the first talker, right The second talker is exported in the picture of side, left side picture exports the first talker.

S96, the video camera for shooting current speakers' video successfully obtain the current speakers video it Before, export the first two talker of the current speakers respectively in described pair of picture.

It should be noted that step S96 was performed before step S95.

Due to it there is change since talker, the mistake that the video of the current speakers terminates successfully is obtained to video camera Cheng Zhong, video camera can rotate/push-and-pull camera, so as to produce fuzzy or unstable picture.Therefore, dividing in described pair of picture The first two talker of the current speakers is not exported, can avoid the output fuzzy or unstable picture.

Illustrated with the accompanying drawing 11A and 11B of control below.Such as Figure 11 A, according to order from left to right, arrange three width Figure is respectively the first width figure, the second width figure, the 3rd width figure.First width figure right panel talker is the first width figure left side talker Previous talker, the first width figure left side picture talker be the 3rd width figure right panel talker previous talker. Now, talker is changed to the 3rd width figure right panel talker by the first width figure left side picture talker.Occur from talker During changing before starting to video camera the appropriate video for successfully obtaining the 3rd width figure right panel talker, if directly The output picture that is shot during rotations/push-and-pull camera of video camera, just occur it is fuzzy in the second width figure right panel or Unstable picture.As shown in Figure 11 B, correspondingly, in above process, output is the first width figure to the specific embodiment of the invention The moving frame of talker（Second width figure right panel）With the freeze frame (of the previous talker of the first width figure talker Two width figures left side picture), the output fuzzy or unstable picture can be avoided.

（1）, the unmanned speech in local meeting-place

（2）, local meeting-place single people speech, nobody chips in

Current speakers are exported in a part of picture, the output of another part picture is the previous of the current speakers Talker, picture composition mode is constant；

Speaker is exported in a part of picture, another part picture does not switch or export the people that chips in, it is preferably described another A part of picture does not switch；

（4）, local meeting-place single people speech, when have movement

If talker's walks about, the skew of head or body exports picture and positioned at the setting of the picture without departing from current In central area, then video camera does not switch, and does not also track, and output picture composition mode is constant；If the movement of talker is caused Talker still without departing from current output picture it is likely that or exceeded the setting central area of current output picture, then Video camera does not switch, but can do appropriate tracking, and to keep talker to be located in central area, output picture composition mode is not Become；If the movement of talker causes talker beyond current output picture, switch video camera, talker is carried out Tracking；

If latter speaker location exports picture and positioned at the setting center of the picture without departing from previous talker's In region, then video camera does not switch, and does not also track, and the picture of output is the picture that latter talker is located in central area Face；If latter position of talker still without departing from previous talker output picture it is likely that or having exceeded the picture Setting central area, then video camera does not switch, but can do appropriate tracking, with keep latter talker be located at center In domain, the picture of output is the picture that latter talker is located in central area；If latter speaker location has exceeded The output picture of previous talker, then switch video camera, and latter talker be tracked；

Video camera alternately tracks the teller after each position is changed, and changes the combination of output picture, i.e., often After secondary change, current speakers are exported in a part of picture, the output of another part picture is the previous of the current speakers Individual talker.

In this specific embodiment, when the position that talker occurs every time is changed, the video camera for controlling tracking mark to be 0 goes Talker after the change of track up position, and, after the video camera successfully obtains the sizeable video of talker, Its tracking mark is set to 1 by 0, and another tracking mark of video camera is then set to 0 by 1.Thus can always ensure at certain At the moment, there is a video camera to shoot current speakers, while an also other video camera can be used for shooting described working as Next talker of preceding talker.That is, two video cameras can cooperate, slitless connection.Due in talker Position when changing, the process that video camera successfully obtains the video of the talker after change needs the regular hour.At this Period, the video of the previous talker of the output current speakers is kept, only successfully obtain current speech in video camera After the video of person, the video of the current speakers is just exported, need that picture first is switched into meeting-place relative to prior art Panorama, after video camera successfully obtain change after talker video when, just by picture be switched to change after talker, this Invention can actually reduce video switching times, so that picture linking is tight, the video of output is more smooth.And, when meeting When many people rapidly alternately talk in, prior art can be by dedicated for shooting picture that the video camera of talker's video shoots Multiple talkers are included, if the multiple talker is distant, cannot be observed in captured picture described many The expression of individual talker.In the present invention, due to the mutual cooperation of first video camera and second video camera, even if There is talker in meeting-place rapidly alternately to talk, two video cameras are also capable of the facial picture of reverse shot talker.Additionally, logical Cross double pictures form output current speakers and the current speakers previous talker video, remote site with Meeting person is in addition to it can clearly be observed that current speakers' face feature, it can also be observed that a side couple in local meeting-place In the reaction of the opposing party's speech（It is adapted to many people's talks, the situation of particularly two people talk）, so, just more retain valuable Conferencing information.

Corresponding with a kind of method for controlling video capture provided in an embodiment of the present invention, the embodiment of the present invention also provides one Plant the device of control video capture.The device of control video capture provided in an embodiment of the present invention can be by possessing control process work( Can a class device implement, described device can be for example video camera, Video Controller, video terminal etc..Such as Figure 12 Shown, a kind of device 12 for controlling video capture provided in an embodiment of the present invention includes：

Control unit 121, in the first speaker, the first camera head of control to shoot regarding for the first talker Frequently；For when current speakers are changed to the second talker from first talker, the second camera head of control shoots the The video of two talkers, wherein, second talker is the next talkers different from first speaker location；Also For when subsequently there is talker's change again, controlling first camera head and second camera head alternately to clap successively Take the photograph the video of current speakers.

Processing unit 122, is connected with described control unit 121, for successfully obtaining the video of the current speakers The video of the current speakers is exported afterwards.

Wherein, alternatively, in one embodiment, described control unit 121 can be additionally used in：The first camera head is controlled to clap Before taking the photograph the video of the first talker, in original state, first camera head and second camera head is controlled to clap Take the photograph the video in whole meeting-place；

The processing unit 122, is additionally operable to captured video frequency output.

Alternatively, in another embodiment, described control unit 121 is additionally operable to：It is first camera head and institute State the second camera head and be respectively provided with tracking mark, wherein, the tracking mark of first camera head is initially the first tracking Mark, the tracking mark of second camera head is initially the second tracking mark.

Described control unit 121 specifically for：In the first speaker, control with the first tracking mark first Camera head goes to shoot the video of the first talker, after the video for successfully obtaining first talker, described first is taken the photograph As the tracking mark of device is set to second tracking mark from first tracking mark, while the described second shooting is filled The tracking mark put is set to first tracking mark from second tracking mark.

Described control unit 121 specifically for：In current speakers the second talker is changed to from first talker When, second camera head of the control with first tracking mark goes to shoot the video of the second talker, is successfully obtaining institute After stating the video of the second talker, the tracking mark of second camera head is set to from first tracking mark described Second tracking mark, while the tracking mark of first camera head is set into described first from second tracking mark Tracking mark.

Described control unit 121 specifically for：During the follow-up talker's change of generation every time, control has described first to follow the trail of The camera head of mark goes to shoot the video of current speakers, after the video for successfully obtaining current speakers, by described first The tracking mark of camera head and second camera head is exchanged.

Alternatively, control unit 121 specifically for：Judge second speaker location whether in first talker Output picture in；If second speaker location is not in the output picture of first talker, described is controlled Two camera heads shoot the video of second talker；

If second speaker location determines whether described second in the output picture of first talker Whether speaker location is in the setting regions of the output picture of first talker；If second speaker location is in institute State in setting regions, then control first camera head to shoot the video of second talker；If second talker Position does not control the second talker described in the first camera head track up in the setting regions, then, so that described Second speaker location is in the setting regions.

Alternatively, described control unit 121 can be specifically for：Using auditory localization technology, control camera head shoots to be said The video of words person.

Further, described control unit 121 can be specifically for：Using auditory localization technology and with reference to presetting bit or figure As identification technology, control camera head shoots the video of talker.

It should be noted that first camera head and second camera head can be connected by attachment means Together, it is also possible to separate.

In the present embodiment, when someone starts speech, control unit 121 controls wherein one camera head to shoot current speech The video of person, processing unit 122 after the video for successfully getting current speakers, by the video frequency output.Now, it is another Platform camera head is in the armed state for preparing next talker of current speakers described in track up.As follow-up talker When changing, control unit 121 can immediately control the camera head in the armed state to shoot the current speech The video of next talker of person.Due to it there is change since the position of talker, the conjunction of talker after being changed to acquisition The process of suitable video needs the time, and the picture that the present embodiment exports remote site in the meantime need not first be switched to meeting-place Panorama, but the video of talker before changing is continued to output, in such manner, it is possible to video switching times are reduced, so that picture is connected Closely, the video of output is more smooth.Being additionally, since control unit 121 controls two camera head reverse shots currently to talk The video of person, even if there is talker in meeting-place rapidly alternately talking, two camera heads also being capable of reverse shot speech The facial picture of person, more retains valuable conferencing information.

Alternatively, in another embodiment of the invention, processing unit 122 can be with the full frame output current speakers' Video.Processing unit 122 specifically for：After the video for successfully obtaining the current speakers, the current speech is set The video full screen display of person, after accomplishing the setting up, the video of the full frame output current speakers；Described currently said successfully obtaining Before the video of words person, the video of the previous talker of the full frame output current speakers.

By the video of the full frame output current speakers, the participant of remote site can clearly observe The facial feature of the current speakers, these facial features may include important conferencing information, so, can be further Retain valuable conferencing information.

Alternatively, in another embodiment of the present invention, processing unit 122 can simultaneously export institute in a form of picture-in-picture State the video of the previous talker of current speakers and the current speakers.

Processing unit 122 specifically for：After the video for successfully obtaining the current speakers, setting is described currently to be said The video of the previous talker of the video of words person and the current speakers is shown in a form of picture-in-picture；Wherein, institute Picture-in-picture is stated including the first picture and the second in first picture, smaller than first picture picture is included in, in institute State and show the current speakers in the first picture, show that the previous of the current speakers is said in second picture Words person；After being provided with, the previous of the current speakers and the current speakers is exported simultaneously in a form of picture-in-picture The video of talker.

Control unit 121 is additionally operable to：When current speakers are changed to three talkers from second talker, control First camera head shoots the video of the 3rd talker, wherein, the 3rd talker is and second talker position Put different next talkers.

Processing unit 122 specifically for：Before the video of the 3rd talker is successfully obtained：In first picture Middle output second talker, exports the freeze frame of first talker in second picture；Or, described Second talker is exported in first picture, output has begun to shoot but not yet successfully obtained in second picture The 3rd talker in journey；After the video for successfully obtaining the 3rd talker：Exported in first picture 3rd talker, exports second talker in second picture.

Export the previous talker's of the current speakers and the current speakers simultaneously in a form of picture-in-picture Video so that the participant of remote site can clearly be observed that the facial feature of the current speakers, while can be with See that the situation of talker's change and a side, for the reaction that the opposing party makes a speech, so, just more retain in local meeting-place Valuable conferencing information.

Alternatively, in one more embodiment of the present invention, processing unit 122 can export institute simultaneously in the form of double pictures State the video of the previous talker of current speakers and the current speakers.

Processing unit 122 specifically for：After the video for successfully obtaining the current speakers, setting is described currently to be said The video of the previous talker of the video of words person and the current speakers is shown in the form of double pictures；Wherein, institute Stating double pictures includes the two part pictures not included mutually, and a part of picture shows the current speakers, and another part picture shows Show the previous talker of the current speakers；After being provided with, the current speech is exported simultaneously in the form of double pictures The video of the previous talker of person and the current speakers.

Processing unit 122 specifically for：Before the video of the 3rd talker is successfully obtained：Drawn in the part The freeze frame of first talker is exported in face, second talker is exported in another part picture；Or Person, output has begun to shoot but the 3rd talker in not yet successful acquisition process in a part of picture, Second talker is exported in another part picture；After the video for successfully obtaining the 3rd talker：Institute State and export the 3rd talker in a part of picture, second talker is exported in another part picture.

The video of the previous talker of current speakers and the current speakers is exported by the form of double pictures, far The participant for holding meeting-place is in addition to it can clearly be observed that current speakers' face feature, it can also be observed that local meeting A side is for reaction that the opposing party makes a speech in（It is adapted to many people's talks, the situation of particularly two people talk）, so, just more Retain valuable conferencing information.

It is worth noting that, in the device embodiment of above-mentioned control video capture, included unit be according to What function logic was divided, but above-mentioned division is not limited to, as long as corresponding function can be realized；In addition, each The specific name of functional unit is also only to facilitate mutually distinguish, the protection domain being not intended to limit the invention.

The other embodiment of the device of present invention control video capture is illustrated referring to Figure 13 A to Figure 13 C.Such as Shown in Figure 13 A, the device 13 of control video capture provided in an embodiment of the present invention includes：

Controller 131, in the first speaker, control the first photographing module 132 to shoot the first talker's Video；For when current speakers are changed to the second talker from first talker, the second photographing module 133 of control to be clapped The video of the second talker is taken the photograph, wherein, second talker is the next speeches different from first speaker location Person；It is additionally operable to, when subsequently there is talker's change again, control the first photographing module 132 and the second photographing module 133 to hand over successively For the video for shooting current speakers.

Output processor 134, is connected, with the first photographing module 132 and the second photographing module 133 for successfully obtaining institute The video of the current speakers is exported after the video for stating current speakers.

The output processor 134 can be integrated in the first photographing module 132 or the second photographing module 133, it is also possible to Separated with the first photographing module 132 and the second photographing module 133.

Wherein, alternatively, the controller 131 can be additionally used in：The first photographing module 132 is controlled to shoot the first talker's Before video, in original state, the first photographing module 132 and the second photographing module 133 is controlled to shoot the video in whole meeting-place；

The output processor 134, is additionally operable to the video frequency output in captured whole meeting-place.

First photographing module 132 can be with separate with the second photographing module 133, it is also possible to solid by attachment means connection It is scheduled on together, forms a double photographing module.First photographing module 132 can be integrated in control and regard with the second photographing module 133 On the device 13 that frequency shoots, it is also possible to which the device 13 with control video capture is separated.

Alternatively, in one embodiment, the controller 131 can be additionally used in：It is first photographing module 132 and institute State the second photographing module 133 and be respectively provided with tracking mark, wherein, the tracking mark of first photographing module 132 is initially One tracking mark, the tracking mark of second photographing module 133 is initially the second tracking mark.

The controller 131 specifically for：In the first speaker, control with the first tracking mark first is taken the photograph As module 132 goes to shoot the video of the first talker, after the video for successfully obtaining first talker, described first is taken the photograph As the tracking mark of module 132 is set to second tracking mark from first tracking mark, while described second is taken the photograph As the tracking mark of module 133 is set to first tracking mark from second tracking mark.

The controller 131 specifically for：When current speakers are changed to the second talker from first talker, Second photographing module 133 of the control with first tracking mark goes to shoot the video of the second talker, is successfully obtaining institute After stating the video of the second talker, the tracking mark of second photographing module 133 is set to from first tracking mark Second tracking mark, while the tracking mark of first photographing module 132 is set to from second tracking mark First tracking mark.

The controller 131 specifically for：When follow-up each generation talker changes, there is the described first tracking to mark for control The camera head of will goes to shoot the video of current speakers, after the video for successfully obtaining current speakers, described first is taken the photograph As the tracking mark of module 132 and second photographing module 133 is exchanged.

As shown in Figure 13 B, alternatively, the device 13 of control video capture provided in an embodiment of the present invention also includes：

Pickup microphone array 135, auditory localization device 136, are used for：Using auditory localization technology, the position of talker is obtained Put, wherein, according to the sound that pickup microphone array 135 is picked up, auditory localization device 136 is determined using auditory localization technology Position.The position that controller 131 is obtained according to auditory localization, control photographing module shoots the video of talker.

As shown in Figure 13 B, further, the device 13 of control video capture provided in an embodiment of the present invention also includes：Figure As locator 137, for being determined talker using image recognition technologys such as the dynamic detections of Face datection, Face Detection or lip Position；Controller 131 can be used for the positional information obtained according to image recognition technology, and control photographing module shoots regarding for talker Frequently.

Alternatively, controller 131 is obtained according to auditory localization position and presetting bit information, control photographing module shoots to be said The video of words person.

Alternatively, whether framing device 137 is specifically for judging second speaker location in the described first speech In the output picture of person；If second speaker location is not in the output picture of first talker, controller 131 The second photographing module 133 is controlled to shoot the video of second talker；

If in the output picture of first talker, framing device 137 enters one to second speaker location Whether step judges second speaker location in the setting regions of the output picture of first talker；If described second In the setting regions, then controller 131 controls the first photographing module 132 to shoot second talker's to speaker location Video；If second speaker location is not in the setting regions, controller 131 control the first photographing module 132 with Track shoots second talker, so that second speaker location is in the setting regions.

In the present embodiment, when someone starts speech, controller 131 controls wherein the first photographing module 132 to shoot currently to say The video of words person, output processor 134 gets the video of current speakers, and exports the video.Now, the second photographing module 133 armed states in next talker of current speakers described in track up is prepared.When follow-up talker becomes When more, controller 131 can immediately control the second photographing module 133 in the armed state to shoot the current speakers Next talker video.Due to it there is change since the position of talker, talker's is suitable after being changed to acquisition The process of video needs the time, and the picture that the present embodiment exports remote site in the meantime need not first be switched to the complete of meeting-place Scape, but the video of talker before changing is continued to output, in such manner, it is possible to video switching times are reduced, so that picture linking is tight Close, the video of output is more smooth.It is additionally, since the two photographing module reverse shot current speakers' of control of controller 131 Video, even if there is talker in meeting-place rapidly alternately talking, two photographing modules are also capable of reverse shot talker's Facial picture, more retains valuable conferencing information.

Alternatively, in another embodiment of the invention, output processor 134 can be with the full frame output current speakers Video.Output processor 134 specifically for：After the video for successfully obtaining the current speakers, set described current The video full screen display of talker, after accomplishing the setting up, the video of the full frame output current speakers；Successfully obtaining described working as Before the video of preceding talker, the video of the previous talker of the full frame output current speakers.

Alternatively, in another embodiment of the present invention, output processor 134 can be exported simultaneously in a form of picture-in-picture The video of the previous talker of the current speakers and the current speakers.

Output processor 134 specifically for：After the video for successfully obtaining the current speakers, set described current The video of the previous talker of the video of talker and the current speakers is shown in a form of picture-in-picture；Wherein, The picture-in-picture includes the first picture and is included in the second in first picture, smaller than first picture picture, The current speakers are shown in first picture, shows that the previous of the current speakers is said in second picture Words person；After being provided with, the previous of the current speakers and the current speakers is exported simultaneously in a form of picture-in-picture The video of talker.

Controller 131 is additionally operable to：When current speakers are changed to three talkers from second talker, control the One photographing module 132 shoots the video of the 3rd talker, wherein, the 3rd talker is with second speaker location not Same next talker.

Output processor 134 specifically for：Before the video of the 3rd talker is successfully obtained：Drawn described first Second talker is exported in face, the freeze frame of first talker is exported in second picture；Or, institute State and export second talker in the first picture, output has begun to shoot but not yet successfully obtain in second picture During the 3rd talker；After the video for successfully obtaining the 3rd talker：It is defeated in first picture Go out the 3rd talker, second talker is exported in second picture.

Export the previous talker's of the current speakers and the current speakers simultaneously in a form of picture-in-picture Video so that the participant of remote site can clearly be observed that the facial feature of the current speakers, while can be with See that the situation of talker's change and a side, for the reaction that the opposing party makes a speech, so, just further protect in local meeting-place Valuable conferencing information is stayed.

Alternatively, in one more embodiment of the present invention, output processor 134 can simultaneously be exported in the form of double pictures The video of the previous talker of the current speakers and the current speakers.

Output processor 134 specifically for：After the video for successfully obtaining the current speakers, described working as, is set The video of the previous talker of the video of preceding talker and the current speakers is shown in the form of double pictures；Its In, described pair of picture includes the two part pictures not included mutually, and a part of picture shows the current speakers, and another part is drawn Face shows the previous talker of the current speakers；After being provided with, export described current simultaneously in the form of double pictures The video of the previous talker of talker and the current speakers.

Controller 131 is additionally operable to：When current speakers are changed to three talkers from second talker, institute is controlled The video that the first photographing module 132 shoots the 3rd talker is stated, wherein, the 3rd talker is and second talker position Put different next talkers.

Output processor 134 specifically for：Before the video of the 3rd talker is successfully obtained：In the part The freeze frame of first talker is exported in picture, second talker is exported in another part picture；Or Person, output has begun to shoot but the 3rd talker in not yet successful acquisition process in a part of picture, Second talker is exported in another part picture；After the video for successfully obtaining the 3rd talker：Institute State and export the 3rd talker in a part of picture, second talker is exported in another part picture.

The video of the previous talker of current speakers and the current speakers is exported by the form of double pictures, far The participant for holding meeting-place is in addition to it can clearly be observed that current speakers' face feature, it can also be observed that local meeting A side so, just further retains valuable conferencing information for the reaction that the opposing party makes a speech in.

Below in conjunction with the accompanying drawings by a specific complete embodiment to control video capture provided in an embodiment of the present invention Device 13 illustrate.As shown in fig. 13 c, the device 13 of control video capture provided in an embodiment of the present invention includes：

Controller 131；First photographing module 132, initial tracking mark is set to 0；Second photographing module 133, it is initial Tracking mark is set to 1；Output processor 134；Pickup microphone array 135；Auditory localization device 136；Framing device 137；It is main Control module 138；Video module 139；Video signal preprocessor 140；Audio-frequency module 141；Audio signal processor 142；Pickup wheat Gram wind 143；Loudspeaker 144；Display 145.Above-mentioned various pieces can divide with an integrated complete device, or mutually From part, and the co-ordination under the control of controller 131 and main control module 138.

After the device 13 of control video capture is opened, i.e., when meeting starts, the also nobody's speech of local meeting-place is The deployment scenarios in local meeting-place are sent to remote site, controller 131 can control described two photographing modules to shoot meetings The panorama of field.After the video that photographing module photographs local meeting-place, it is preferable that using the video letter in video module 139 The video that 140 pairs of the second photographing modules 133 of number processor shoot carries out the treatment such as encoding and decoding, and in the control of main control module 138 Under, the video is sent to remote site by network.

When there is a people to start speech in local meeting-place, that is, when there is the first talker, pickup microphone array 1,350 The sound in local meeting-place is taken, the sound in the local meeting-place is sent to auditory localization device 136, wherein, the local meeting-place Sound can pass through by the internal module of audio-frequency module 141 during auditory localization device 136 is sent to（For example have pre- The module of processing function）To its carry out denoising etc. treatment after, be then forwarded to auditory localization device 136.Auditory localization device 136 According to the positional information that auditory localization is produced, controller 131 obtains the positional information that auditory localization device 136 is produced, control first Photographing module 132（I.e. tracking mark is 0 photographing module）Suitable angle is turned to, it is rough to obtain regarding for the first talker Frequently.Then, the video of the first talker that framing device 137 is obtained according to the first photographing module 132, using image recognition skill Art determines the accurate location of first talker（Including facial positions）.Under the control of controller 131, the first photographing module 132（I.e. tracking mark is 0 photographing module）Rotation/push-and-pull camera, shoots the appropriate video of first talker.First After the video for successfully photographing first talker, its tracking mark puts 1, the second photographing module to photographing module 132 by 0 133 tracking mark is set to 0 by 1.

In the first photographing module 132 after the video for successfully photographing first talker, if talker occurs Change, i.e., be changed to second talker by first talker, and it is 0 that controller 131 can control the tracking mark Photographing module（That is the second photographing module 133）Remove to shoot the video of second talker, control the method for shooting ibid.When Second photographing module 133 is photographed after the appropriate video of second talker, and its tracking mark is set to 1, the first shooting by 0 The tracking mark of module 132 is then set to 0 by 1.

As described above, when there is talker's change every time, controller 131 controls the shooting mould that tracking mark is 0 Block（May be specifically the first photographing module 132 or the second photographing module 133）Talker after going track up to change, and, After the appropriate video that the photographing module successfully shoots talker, its tracking mark is set to 1 by 0, another shooting mould The tracking mark of block is then set to 0 by 1.

After the video that photographing module successfully shoots talker, output processor 134 obtains described from photographing module The video of talker.After the video for getting the talker, output processor 134 can set the way of output of video, The video of the talker for getting can be exported in the mode such as full frame, picture-in-picture or double pictures.

The video of the talker is sent to video by output processor 134 after the completion of the way of output for setting video Signal processor 140, is carried out the treatment such as encoding by 140 pairs of videos of the talker of video signal preprocessor.Then, in master control Under the control of module 138, the video of the talker is sent to distal end meeting by network since video signal preprocessor 140 .

Further, before the video that photographing module successfully obtains current speakers, main control module 138 can control defeated Go out the video that processor 134 exports the previous talker of the current speakers.

In addition, audio signal processor 142 is used for the sound of the talker in the local meeting-place picked up to pickup microphone 143 Sound carries out the treatment such as encoding, it is necessary to explanation, the purposes of the sound of the pickup of pickup microphone 143 is different from pickup microphone battle array The sound of the pickup of row 135, the video that the former is used to be shot with photographing module is sent to remote site together, and the latter is fixed for sound source Position.Loudspeaker 144 and display 145 are all the basic configuration of the device 13 for controlling video capture, are respectively used in local meeting-place Middle output Voice ＆ Video.

Each embodiment in this specification is described with having stressed, identical similar portion between each embodiment Divide mutually referring to what each embodiment was stressed is the difference with other embodiment.Especially for device reality Apply for example, because it is substantially similar to embodiment of the method, so describing fairly simple, related part is referring to embodiment of the method Part explanation.

It should be noted that, device embodiment described above is only schematical, wherein described as separating component The unit of explanation can be or may not be physically separate, and the part shown as unit can be or can also It is not physical location, you can with positioned at a place, or can also be distributed on multiple NEs.Can be according to reality Selection some or all of module therein is needed to realize the purpose of this embodiment scheme.In addition, the device that the present invention is provided In embodiment accompanying drawing, the annexation between module is represented has communication connection between them, specifically can be implemented as one or A plurality of communication bus or holding wire.Those of ordinary skill in the art are without creative efforts, you can to understand And implement.

It will be recognized by those of ordinary skill in the art that the possibility implementation of various aspects of the invention or various aspects System, method or computer program product can be embodied as.Therefore, each aspect of the present invention or various aspects Possible implementation can using complete hardware embodiment, complete software embodiment (including firmware, resident software etc.), or The form of the embodiment of person's integration software and hardware aspect, collectively referred to herein as " circuit ", " module " or " system ".This Outward, the possibility implementation of each aspect of the present invention or various aspects can be in the form of computer program product, computer Program product refers to computer readable program code of the storage in computer-readable medium.

Computer-readable medium can be computer-readable signal media or computer-readable recording medium.Computer can Read storage medium including but not limited to electronics, magnetic, optics, electromagnetism, infrared or semiconductor system, equipment or device, or Foregoing is any appropriately combined, such as random access memory (RAM), read-only storage (ROM), the read-only storage of erasable programmable Device (EPROM or flash memory), optical fiber, portable read-only storage (CD-ROM).

Processor in computer reads computer readable program code of the storage in computer-readable medium so that place Reason device is able to carry out function action specified in the combination of each step or each step in flow charts；Generation is implemented in block diagram Each piece or each piece of combination specified in function action device.

Computer readable program code can perform completely on the computer of user, partly hold on the computer of user Row, as single software kit, partly on the computer of user and part on the remote computer, or completely long-range Performed on computer or server.It is also noted that in some alternate embodiments, each step or frame in flow charts Each piece of function of indicating may not be occurred by the order indicated in figure in figure.For example, depending on involved function, show in succession Two steps or two blocks for going out may actually be executed substantially concurrently, or these blocks sometimes may be by with opposite suitable Sequence is performed.

The above, specific embodiment only of the invention, but protection scope of the present invention is not limited thereto, and it is any Those familiar with the art the invention discloses technical scope in, change or replacement can be readily occurred in, should all contain Cover within protection scope of the present invention.Therefore, protection scope of the present invention should be defined by scope of the claims.

Claims

1. it is a kind of control video capture method, it is characterised in that including：

When current speakers are changed to the second talker from first talker, the second camera head of control shoots second and says The video of words person, wherein, second talker is the next talkers different from first speaker location；

When subsequently there is talker's change again, first camera head and second camera head is controlled alternately to clap successively Take the photograph the video of current speakers；

After the video for successfully obtaining the current speakers, the video of the current speakers is exported；

Wherein, it is described when current speakers are changed to the second talker from first talker, control the second camera head The video for shooting the second talker includes：

Judge second speaker location whether in the output picture of first talker；

If second speaker location controls second camera head not in the output picture of first talker Shoot the video of second talker；

If second speaker location is in the output picture of first talker, second speech is determined whether Whether person position is in the setting regions of the output picture of first talker；

If second speaker location is in the setting regions, controls first camera head to shoot described second and say The video of words person；

If second speaker location is not in the setting regions, control described in the first camera head track up Second talker, so that second speaker location is in the setting regions.

2. method according to claim 1, it is characterised in that the video of the output current speakers includes：Entirely The video of the screen output current speakers.

3. method according to claim 2, it is characterised in that the video bag of the full frame output current speakers Include：

Before the video of the current speakers is successfully obtained, the previous talker's of the full frame output current speakers Video；

4. method according to claim 1, it is characterised in that the video of the output current speakers includes：With The form of picture-in-picture exports the video of the previous talker of the current speakers and the current speakers simultaneously；

Wherein, the picture-in-picture includes the first picture and is included in first picture, smaller than first picture the Two pictures, the current speakers are exported in first picture, and the current speakers are exported in second picture Previous talker.

5. method according to claim 4, it is characterised in that methods described also includes：

When current speakers are changed to three talkers from second talker, first camera head is controlled to shoot the The video of three talkers, wherein, the 3rd talker is the next talkers different from second speaker location；

The previous talker's for exporting the current speakers and the current speakers simultaneously in a form of picture-in-picture Video includes：

Before the video of the 3rd talker is successfully obtained：Second talker is exported in first picture, The freeze frame of first talker is exported in second picture；Or, export described second in first picture Talker, output has begun to shoot but the 3rd talker in not yet successful acquisition process in second picture；

After the video for successfully obtaining the 3rd talker：The 3rd talker is exported in first picture, Second talker is exported in second picture.

6. method according to claim 1, it is characterised in that the video of the output current speakers includes：With The form of double pictures exports the video of the previous talker of the current speakers and the current speakers simultaneously；

Wherein, described pair of picture includes the two part pictures not included mutually, and a part of picture exports the current speakers, another Part picture exports the previous talker of the current speakers.

7. method according to claim 6, methods described also includes：

The previous talker's for exporting the current speakers and the current speakers simultaneously in the form of double pictures Video includes：

Before the video of the 3rd talker is successfully obtained：Export first talker's in a part of picture Freeze frame, exports second talker in another part picture；Or, exported in a part of picture The 3rd talker through starting to shoot but in the acquisition process that not yet succeeds, exports described the in another part picture Two talkers；

After the video for successfully obtaining the 3rd talker：The 3rd talker is exported in a part of picture, Second talker is exported in another part picture.

8. method according to claim 1, it is characterised in that shoot the first talker in the first camera head of the control Video before, methods described also includes：

In original state, the video in control first camera head and the whole meeting-place of second camera head shooting simultaneously will Captured video frequency output.

9. the method according to claim 1-8 any one, it is characterised in that shot in the first camera head of the control Before the video of the first talker, methods described also includes：

It is that first camera head and second camera head are respectively provided with tracking mark, wherein, the first shooting dress The tracking mark put is initially the first tracking mark, and the tracking mark of second camera head is initially the second tracking mark；

It is described in the first speaker, control the first camera head shoot the first talker video include：Said first When words person talks, first camera head of the control with the first tracking mark goes to shoot the video of the first talker, is successfully obtaining After taking the video of first talker, the tracking mark of first camera head is set to from first tracking mark Second tracking mark, while the tracking mark of second camera head is set to from second tracking mark described First tracking mark；

It is described when current speakers are changed to the second talker from first talker, the second camera head of control shoots the The video of two talkers includes：When current speakers are changed to the second talker from first talker, control has institute The second camera head for stating the first tracking mark goes to shoot the video of the second talker, is successfully obtaining second talker's After video, the tracking mark of second camera head is set to second tracking mark from first tracking mark, The tracking mark of first camera head is set to first tracking mark from second tracking mark simultaneously.

10. method according to claim 9, it is characterised in that

It is described when subsequently there is talker again and change, control first camera head and second camera head to hand over successively Include for the video for shooting current speakers：During the follow-up talker's change of generation every time, control has first tracking mark Camera head go shoot current speakers video, successfully obtain current speakers video after, will described first image The tracking mark of device and second camera head is exchanged.

11. methods according to claim 10, it is characterised in that the video that control camera head shoots talker includes：

12. methods according to claim 11, it is characterised in that the utilization auditory localization technology, control camera head The video for shooting talker includes：

Using auditory localization technology and with reference to presetting bit or image recognition technology, control camera head shoots the video of talker.

A kind of 13. devices of control video capture, it is characterised in that including：

Described control unit, is additionally operable to when current speakers are changed to the second talker from first talker, control the Two camera heads shoot the video of the second talker, wherein, second talker is different from first speaker location Next talker；

Described control unit, is additionally operable to, when subsequently there is talker's change again, first camera head and institute be controlled successively State the video of the second camera head reverse shot current speakers；

Processing unit, is connected with described control unit, for exporting institute after the video for successfully obtaining the current speakers State the video of current speakers；

Wherein, described control unit specifically for：

Judge second speaker location whether in the output picture of first talker；

14. devices according to claim 13, it is characterised in that the processing unit specifically for：

The video full screen display of the current speakers is set；

The video of the full frame output current speakers.

15. devices according to claim 14, it is characterised in that the processing unit specifically for：

16. devices according to claim 13, it is characterised in that the processing unit specifically for：

The video of the video of the current speakers and the previous talker of the current speakers is set with the shape of picture-in-picture Formula is shown；

Wherein, the picture-in-picture includes the first picture and is included in first picture, smaller than first picture the Two pictures, the current speakers are shown in first picture, and the current speakers are shown in second picture Previous talker；

Export the video of the previous talker of the current speakers and the current speakers simultaneously in a form of picture-in-picture.

17. devices according to claim 14, it is characterised in that described control unit is additionally operable to：

The processing unit specifically for：

18. devices according to claim 13, it is characterised in that the processing unit specifically for：

The video of the video of the current speakers and the previous talker of the current speakers is set with the shape of double pictures Formula is shown；

Export the video of the previous talker of the current speakers and the current speakers simultaneously in the form of double pictures.

19. devices according to claim 18, it is characterised in that described control unit is additionally operable to：

The processing unit specifically for：

20. devices according to claim 13, it is characterised in that the first camera head of control shoots regarding for the first talker Before frequency, described control unit is additionally operable to：

In original state, first camera head and second camera head is controlled to shoot the video in whole meeting-place；

The processing unit, is additionally operable to the video frequency output in the whole meeting-place captured by described control unit.

21. device according to claim 13-20 any one, it is characterised in that described control unit is additionally operable to：

Described control unit specifically for：In the first speaker, first shooting dress of the control with the first tracking mark The video for shooting the first talker is put, after the video for successfully obtaining first talker, by first camera head Tracking mark be set to second tracking mark from first tracking mark, while chasing after second camera head Track mark is set to first tracking mark from second tracking mark；

Described control unit specifically for：When current speakers are changed to the second talker from first talker, control The second camera head with first tracking mark goes to shoot the video of the second talker, is said successfully obtaining described second After the video of words person, the tracking mark of second camera head is set to described second from first tracking mark and is followed the trail of Mark, while the tracking mark of first camera head is set into described first from second tracking mark follows the trail of mark Will.

22. devices according to claim 21, it is characterised in that described control unit specifically for：It is follow-up to occur every time When talker changes, camera head of the control with first tracking mark goes to shoot the video of current speakers, in success After obtaining the video of current speakers, the tracking mark of first camera head and second camera head is exchanged.

23. devices according to claim 22, it is characterised in that described control unit specifically for：

24. devices according to claim 23, it is characterised in that described control unit specifically for：