CN103595953B - A kind of method and apparatus for controlling video capture - Google Patents
A kind of method and apparatus for controlling video capture Download PDFInfo
- Publication number
- CN103595953B CN103595953B CN201310566974.1A CN201310566974A CN103595953B CN 103595953 B CN103595953 B CN 103595953B CN 201310566974 A CN201310566974 A CN 201310566974A CN 103595953 B CN103595953 B CN 103595953B
- Authority
- CN
- China
- Prior art keywords
- talker
- video
- picture
- camera head
- current speakers
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/141—Systems for two-way working between two video terminals, e.g. videophone
- H04N7/142—Constructional details of the terminal equipment, e.g. arrangements of the camera and the display
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Studio Devices (AREA)
Abstract
The present invention provides a kind of method and apparatus for controlling video capture, is related to video image field, can reduce video switching times while the facial picture of talker is retained, and makes the linking of picture closely, and the video of output is more smooth, and methods described includes:In the first speaker, the first camera head of control shoots the video of the first talker;When current speakers are changed to the second talker from first talker, the second camera head of control shoots the video of the second talker, wherein, second talker is the next talkers different from first speaker location;When subsequently there is talker's change again, the video of first camera head and the second camera head reverse shot current speakers is controlled successively;After the video for successfully obtaining the current speakers, the video of the current speakers is exported.The present invention is in video conference.
Description
Technical field
The present invention relates to video image field, more particularly to a kind of method and apparatus for controlling video capture.
Background technology
Generally, in video conference video camera with the complete of all participants of the angle shot of fixed size, fixation
Scape picture.When meeting-place than it is larger when, video camera may from teller farther out, shoot come picture cannot determine who speech,
The facial expression of teller cannot be seen clearly, the loss of meeting valuable information is thereby resulted in.
In order to avoid because of only photographing panoramic picture and caused by meeting valuable information loss, it is possible to use two video cameras
Meeting-place picture is shot simultaneously.Wherein one video camera is used to shoot all the time the panorama in meeting-place, another video camera be used for
Track shoots the picture of teller.
When someone alternately talks in meeting-place, because the video camera of track up talker's picture is currently said in successfully acquisition
Rotation/push-and-pull camera is needed before the picture of words person, the video photographed during this is unstable, watch uncomfortable,
Picture needs first to be switched to the panorama in meeting-place during this.But, this switching can cause the linking of picture not tight, be sent to remote
Hold the video in meeting-place not smooth, can give beholder very uncomfortable sensation.
The content of the invention
Embodiments of the invention provide a kind of method and apparatus for controlling video capture, can retain the face of talker
While picture, video switching times are reduced, make the linking of picture closely, the video of output is more smooth.
A kind of first aspect, there is provided method of control video capture, including:
In the first speaker, the first camera head of control shoots the video of the first talker;
When current speakers are changed to the second talker from first talker, the second camera head of control shoots the
The video of two talkers, wherein, second talker is the next talkers different from first speaker location;
When subsequently there is talker's change again, first camera head and second camera head are controlled successively
The video of reverse shot current speakers;
After the video for successfully obtaining the current speakers, the video of the current speakers is exported.
With reference in a first aspect, in the first possible implementation, the video bag of the output current speakers
Include:The video of the full frame output current speakers;
With reference to the first possible implementation of first aspect, in second possible implementation of first aspect
In, the video of the full frame output current speakers includes:
Before the video of the current speakers is successfully obtained, the previous speech of the full frame output current speakers
The video of person;
After the video for successfully obtaining the current speakers, the video of the full frame output current speakers.
With reference in a first aspect, in the third possible implementation of first aspect, the output current speech
The video of person includes:Export the previous speech of the current speakers and the current speakers simultaneously in a form of picture-in-picture
The video of person;
Wherein, the picture-in-picture includes that the first picture and the first picture described in the ratio being included in first picture are small
Second picture, the current speakers are exported in first picture, and the current speech is exported in second picture
The previous talker of person.
With reference to the third possible implementation of first aspect, in the 4th kind of possible implementation of first aspect
In, methods described also includes:
When current speakers are changed to three talkers from second talker, first camera head is controlled to clap
The video of the 3rd talker is taken the photograph, wherein, the 3rd talker is the next speeches different from second speaker location
Person;
The previous speech for exporting the current speakers and the current speakers simultaneously in a form of picture-in-picture
The video of person includes:
Before the video of the 3rd talker is successfully obtained:Second speech is exported in first picture
Person, exports the freeze frame of first talker in second picture;Or, export described in first picture
Second talker, output has begun to shoot but the 3rd speech in not yet successful acquisition process in second picture
Person;
After the video for successfully obtaining the 3rd talker:The 3rd speech is exported in first picture
Person, exports second talker in second picture.
With reference in a first aspect, in the 5th kind of possible implementation of first aspect, the output current speech
The video of person includes:Export the previous speech of the current speakers and the current speakers simultaneously in the form of double pictures
The video of person;
Wherein, the output picture includes the two part pictures not included mutually, and a part of picture exports the current speech
Person, another part picture exports the previous talker of the current speakers.
With reference to the 5th kind of possible implementation of first aspect, in the 6th kind of possible implementation of first aspect
In, methods described also includes:
When current speakers are changed to three talkers from second talker, first camera head is controlled to clap
The video of the 3rd talker is taken the photograph, wherein, the 3rd talker is the next speeches different from second speaker location
Person;
The previous speech for exporting the current speakers and the current speakers simultaneously in the form of double pictures
The video of person includes:
Before the video of the 3rd talker is successfully obtained:First speech is exported in a part of picture
The freeze frame of person, exports second talker in another part picture;Or, it is defeated in a part of picture
Go out to have begun to shoot but the 3rd talker in not yet successful acquisition process, institute is exported in another part picture
State the second talker;
After the video for successfully obtaining the 3rd talker:The 3rd speech is exported in a part of picture
Person, exports second talker in another part picture.
With reference in a first aspect, in the 7th kind of possible implementation of first aspect, being filled in the shooting of control first
Before putting the video for shooting the first talker, methods described also includes:
In original state, first camera head and second camera head is controlled to shoot the video in whole meeting-place
And by captured video frequency output.
With reference to first aspect or first aspect the first to the 7th kind of possible any implementation, in first aspect
In 8th kind of possible implementation, before the first camera head of the control shoots the video of the first talker, the side
Method also includes:
It is that first camera head and second camera head are respectively provided with tracking mark, wherein, described first takes the photograph
As the tracking mark of device is initially the first tracking mark, the tracking mark of second camera head is initially the second tracking mark
Will;
It is described in the first speaker, control the first camera head shoot the first talker video include:
During one speaker, control with the first tracking mark the first camera head go shoot the first talker video, into
After work(obtains the video of first talker, the tracking mark of first camera head is set from first tracking mark
Second tracking mark is set to, while the tracking mark of second camera head is set to from second tracking mark
First tracking mark;
It is described when current speakers are changed to the second talker from first talker, control the second camera head clap
The video for taking the photograph the second talker includes:When current speakers are changed to the second talker from first talker, control tool
The second camera head for having first tracking mark goes to shoot the video of the second talker, is successfully obtaining second speech
After the video of person, the tracking mark of second camera head is set to described second from first tracking mark and follows the trail of mark
Will, while the tracking mark of first camera head is set into first tracking mark from second tracking mark.
With reference to the 8th kind of possible implementation of first aspect, in the 9th kind of possible implementation of first aspect
In, it is described when subsequently there is talker's change again, control first camera head and second camera head to hand over successively
Include for the video for shooting current speakers:During the follow-up talker's change of generation every time, control has first tracking mark
Camera head go shoot current speakers video, successfully obtain current speakers video after, will described first image
The tracking mark of device and second camera head is exchanged.
With reference to the 9th kind of possible implementation of first aspect, in the tenth kind of possible implementation of first aspect
In, the video that control camera head shoots talker includes:
Using auditory localization technology, control camera head shoots the video of talker.
With reference to the tenth kind of possible implementation of first aspect, in a kind of the tenth possible implementation of first aspect
In, the utilization auditory localization technology, the video that control camera head shoots talker includes:
Using auditory localization technology and with reference to presetting bit or image recognition technology, control camera head shoots regarding for talker
Frequently.
With reference to first aspect or first aspect the first to a kind of the tenth possible any implementation, in first aspect
The 12nd kind of possible implementation in, it is described to be changed to the second talker from first talker in current speakers
When, the video that the second camera head of control shoots the second talker includes:
Judge second speaker location whether in the output picture of first talker;
If second speaker location is not in the output picture of first talker, second shooting is controlled
Device shoots the video of second talker;
If second speaker location determines whether described second in the output picture of first talker
Whether speaker location is in the setting regions of the output picture of first talker;
If second speaker location is in the setting regions, first camera head is controlled to shoot described the
The video of two talkers;
If second speaker location is not in the setting regions, the first camera head track up is controlled
Second talker, so that second speaker location is in the setting regions.
A kind of second aspect, there is provided device of control video capture, including:
Control unit, in the first speaker, the first camera head of control to shoot the video of the first talker;
Described control unit, is additionally operable to when current speakers are changed to the second talker from first talker, control
The video that the second camera head shoots the second talker is made, wherein, second talker is and first speaker location
Different next talkers;
Described control unit, is additionally operable to, when subsequently there is talker's change again, first camera head be controlled successively
With the video of the second camera head reverse shot current speakers;
Processing unit, is connected with described control unit, for defeated after the video for successfully obtaining the current speakers
Go out the video of the current speakers.
With reference to second aspect, in the first possible implementation of second aspect, the processing unit specifically for:
The video full screen display of the current speakers is set;
The video of the full frame output current speakers.
With reference to the first possible implementation of second aspect, in second possible implementation of second aspect
In, the processing unit specifically for:
Before the video of the current speakers is successfully obtained, the previous speech of the full frame output current speakers
The video of person;After the video for successfully obtaining the current speakers, the video of the full frame output current speakers.
With reference to second aspect, in the third possible implementation of second aspect, the processing unit is also specifically used
In:
The video of the video of the current speakers and the previous talker of the current speakers is set with picture-in-picture
Form shown;
Wherein, the picture-in-picture includes the first picture and is included in first picture, smaller than first picture
The second picture, show the current speakers in first picture, shown in second picture and described currently said
The previous talker of words person;
Export the previous talker's of the current speakers and the current speakers simultaneously in a form of picture-in-picture
Video;
With reference to the third possible implementation of second aspect, in the 4th kind of possible implementation of second aspect
In, described control unit is additionally operable to:
When current speakers are changed to three talkers from second talker, first camera head is controlled to clap
The video of the 3rd talker is taken the photograph, wherein, the 3rd talker is the next speeches different from second speaker location
Person;
The processing unit specifically for:
Before the video of the 3rd talker is successfully obtained:Second speech is exported in first picture
Person, exports the freeze frame of first talker in second picture;Or, export described in first picture
Second talker, output has begun to shoot but the 3rd speech in not yet successful acquisition process in second picture
Person;
After the video for successfully obtaining the 3rd talker:The 3rd speech is exported in first picture
Person, exports second talker in second picture.
With reference to second aspect, in the 5th kind of possible implementation of second aspect, the processing unit is also specifically used
In:
The video of the video of the current speakers and the previous talker of the current speakers is set with double pictures
Form shown;
Wherein, described pair of picture includes the two part pictures not included mutually, and a part of picture shows the current speakers,
Another part picture shows the previous talker of the current speakers;
Export the previous talker's of the current speakers and the current speakers simultaneously in the form of double pictures
Video.
With reference to the 5th kind of possible implementation of second aspect, in the 6th kind of possible implementation of second aspect
In, described control unit is additionally operable to:
When current speakers are changed to three talkers from second talker, first camera head is controlled to clap
The video of the 3rd talker is taken the photograph, wherein, the 3rd talker is the next speeches different from second speaker location
Person;
The processing unit specifically for:
Before the video of the 3rd talker is successfully obtained:First speech is exported in a part of picture
The freeze frame of person, exports second talker in another part picture;Or, it is defeated in a part of picture
Go out to have begun to shoot but the 3rd talker in not yet successful acquisition process, institute is exported in another part picture
State the second talker;
After the video for successfully obtaining the 3rd talker:The 3rd speech is exported in a part of picture
Person, exports second talker in another part picture.
With reference to second aspect, in the 7th kind of possible implementation of second aspect, described control unit is additionally operable to:
Before the video for controlling the first camera head to shoot the first talker, in original state, control described first
Camera head and second camera head shoot the video in whole meeting-place;
The processing unit, is additionally operable to captured video frequency output.
With reference to second aspect or second aspect the first to the 7th kind of possible any implementation, in second aspect
In 8th kind of possible implementation, described control unit is additionally operable to:
It is that first camera head and second camera head are respectively provided with tracking mark, wherein, described first takes the photograph
As the tracking mark of device is initially the first tracking mark, the tracking mark of second camera head is initially the second tracking mark
Will;
Described control unit specifically for:In the first speaker, control with the first tracking mark first is taken the photograph
As device goes to shoot the video of the first talker, after the video for successfully obtaining first talker, by the described first shooting
The tracking mark of device is set to second tracking mark from first tracking mark, while by second camera head
Tracking mark be set to first tracking mark from second tracking mark;
Described control unit specifically for:When current speakers are changed to the second talker from first talker,
Second camera head of the control with first tracking mark goes to shoot the video of the second talker, is successfully obtaining described the
After the video of two talkers, the tracking mark of second camera head is set to described second from first tracking mark
Tracking mark, while the tracking mark of first camera head is set into described first from second tracking mark following the trail of
Mark.
With reference to the 8th kind of possible implementation of second aspect, in the 9th kind of possible implementation of second aspect
In, described control unit specifically for:During the follow-up talker's change of generation every time, control taking the photograph with first tracking mark
As device goes to shoot the video of current speakers, after the video for successfully obtaining current speakers, by first camera head
Tracking mark with second camera head is exchanged.
With reference to the 9th kind of possible implementation of second aspect, in the tenth kind of possible implementation of second aspect
In, described control unit specifically for:
Using auditory localization technology, control camera head shoots the video of talker.
With reference to the tenth kind of possible implementation of second aspect, in a kind of the tenth possible implementation of second aspect
In, described control unit specifically for:
Using auditory localization technology and with reference to presetting bit or image recognition technology, control camera head shoots regarding for talker
Frequently.
With reference to second aspect or second aspect the first to a kind of the tenth possible any implementation, in second aspect
The 12nd kind of possible implementation in, described control unit specifically for:
Judge second speaker location whether in the output picture of first talker;
If second speaker location is not in the output picture of first talker, second shooting is controlled
Device shoots the video of second talker;
If second speaker location determines whether described second in the output picture of first talker
Whether speaker location is in the setting regions of the output picture of first talker;
If second speaker location is in the setting regions, first camera head is controlled to shoot described the
The video of two talkers;
If second speaker location is not in the setting regions, the first camera head track up is controlled
Second talker, so that second speaker location is in the setting regions.
After adopting the above technical scheme, the method for the control video capture provided according to the present invention and control video capture
Device, when someone alternately talks in meeting-place, controls first camera head and second camera head alternately to clap successively
The video of current speakers is taken the photograph, and exports the video of current speakers, so, even if there are many people in meeting-place rapidly replacing
Speech, two camera heads can also shoot the facial picture of multiple talkers, and in technical scheme provided by the present invention
In, only after the video that camera head successfully obtains current speakers, the video of current speakers is just exported, relative to existing
Having in technology needs first to be switched to the panorama in meeting-place before camera head successfully obtains the video of next talker, the present invention
Video switching times can actually be reduced, so that picture linking is tight, the video of output is more smooth.
Brief description of the drawings
For clearer the explanation embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
There is the accompanying drawing to be used needed for technology to be briefly described, it should be apparent that, drawings in the following description are only the present invention
Some embodiments, for those of ordinary skill in the art, on the premise of not paying creative work, can also basis
These accompanying drawings obtain other accompanying drawings.
Fig. 1 is the flow chart of an embodiment of the method for present invention control video capture;
Fig. 2A be change after speaker location talker before changing output picture setting regions in the case of, clap
Take the photograph the schematic diagram of talker after changing;
Fig. 2 B be change after speaker location talker before changing output picture in but not in the setting area of the picture
In the case of in domain, the schematic diagram of talker after change is shot;
Fig. 2 C are in the case that speaker location is not in the output picture of talker before changing after change, after shooting change
The schematic diagram of talker;
Fig. 3 A are the flow chart of a specific embodiment of the method for present invention control video capture;
Fig. 3 B are another flow chart of a specific embodiment of the method for present invention control video capture;
Fig. 4 is the schematic diagram of a specific embodiment of the method for present invention control video capture;
The effect diagram of output video camera rotation/push-and-pull process when Fig. 5 A are full screen display;
Fig. 5 B do not export the effect diagram of video camera rotation/push-and-pull process when being full screen display;
Fig. 6 is the flow chart of the another specific embodiment of the method for present invention control video capture;
Fig. 7 is the schematic diagram of the another specific embodiment of the method for present invention control video capture;
Fig. 8 A are the effect diagram that video camera rotation/push-and-pull process is exported when being shown with picture-in-picture;
Fig. 8 B are the effect diagram for not exporting video camera rotation/push-and-pull process when being shown with picture-in-picture;
Fig. 9 is the flow chart of the still another embodiment of the method for present invention control video capture;
Figure 10 is the schematic diagram of the still another embodiment of the method for present invention control video capture;
Figure 11 A are the effect diagram that video camera rotation/push-and-pull process is exported when being shown with double pictures;
Figure 11 B do not export the effect diagram of video camera rotation/push-and-pull process when being and being shown with double pictures;
Figure 12 is the structured flowchart of an embodiment of the device of present invention control video capture;
Figure 13 A are the structural representation of another embodiment of the device of present invention control video capture;
Figure 13 B are the structural representation of the another embodiment of the device of present invention control video capture;
Figure 13 C are the structural representation of the another embodiment of the device of present invention control video capture.
Specific embodiment
The technical scheme to the embodiment of the present invention is clearly and completely described below in conjunction with the accompanying drawings, it is clear that described
Embodiment is only a part of embodiment of the invention, rather than whole embodiments.Based on the embodiment in the present invention, ability
The every other embodiment that domain those of ordinary skill is obtained on the premise of creative work is not made, belongs to the present invention
The scope of protection.
Fig. 1 is the flow chart of an embodiment of the method for present invention control video capture.Control provided in an embodiment of the present invention
The method of video capture processed can be implemented by the class device for possessing control process function, and described device can be for example
Video camera, Video Controller, video terminal etc..As shown in figure 1, the method bag of control video capture provided in an embodiment of the present invention
Include:
S11, in the first speaker, the first camera head of control shoots the video of the first talker.
In embodiments of the present invention, two groups of camera heads are set:First camera head and the second camera head are said to shoot
The video of words person.Wherein, first camera head can be a photographing module, and second camera head can also be one
Individual photographing module.Certainly, within the scope of the invention, first camera head and second camera head can also be respectively
It is multiple photographing modules, the concrete application of multiple photographing modules can be similarly obtained according to an application for photographing module.Institute
Stating the first camera head and second camera head can be connected and fixed by attachment means, it is also possible to separate.
The camera head referred in the embodiment of the present invention can be video camera or other possess the terminal device of camera function.
During the method for control video capture provided in an embodiment of the present invention can apply to video conference, for shooting and defeated
Go out the video of talker in local meeting-place, can be also used for for the picture in local meeting-place being sent to remote site, in order to distal end
The participant in meeting-place watches the situation in local meeting-place.
After camera head unlatching, when video conference starts, if there is no people to talk in local meeting-place, first can be simultaneously controlled
Camera head and the second camera head shoot the panorama in local meeting-place.If the first camera head described in predetermined control shoots meeting-place
In first appearance talker, preferably first by the video frequency output captured by the second camera head to remote site.Now,
Due to there is no talker to occur, the participant of remote site need to only watch the panorama in local meeting-place.Have when in local meeting-place
When talker starts speech, that is, when there is the first talker, the first camera head can be immediately controlled to shoot regarding for the first talker
Frequently;The second camera head can be still controlled to shoot the panorama in local meeting-place simultaneously.
In embodiments of the present invention, it is possible to use auditory localization technology determines the position of talker.Only utilize auditory localization
Technology cannot may accurately obtain the position of talker due to reasons such as noise jammings, therefore, it is further possible in advance
Setting talker possible position residing when being talked in local meeting-place, in the position by auditory localization technical limit spacing talker
When, with reference to possible position set in advance(That is presetting bit)The accuracy rate for being judged is higher.Said to more accurately obtain
The position of words person, can combine auditory localization technology and image recognition technology.Specifically, in control camera head (including first
Camera head and the second camera head) shoot talker video when, can by multiple pickup microphones composition pickup microphone battle array
Row, as the first speaker, the sound in local meeting-place are picked up using the pickup microphone array, are located by before audio
Reason, sends auditory localization device to.Wherein, the auditory localization device is located in the class device for possessing control process function
Possess the module of sound source positioning function, the pickup microphone array by it is more than two, be distributed locally meeting-place diverse location
Pickup microphone composition.The auditory localization device is carried out after receiving the sound that the pickup microphone array is picked up to it
Localization process, obtains the positional information of the first talker.Controller can generate corresponding camera head control according to positional information
Instruction is sent to head, and the first camera head described in cradle head control turns to suitable shooting angle, to obtain described the roughly
The video of one talker, wherein, the head is used to receiving and performing the camera head control instruction that the controller sends.So
Afterwards, positional information, presetting bit information or the image recognition technology for being obtained with reference to auditory localization(Described image identification technology specifically may be used
Think recognition of face, Face datection, the dynamic detection of lip etc.), the more accurate positional information of first talker is obtained, generate
New control instruction is sent to head, controls the first camera head rotation/push-and-pull camera, and described the is obtained as desired
The sizeable picture of one talker, can for example make first talker face occupy whole picture 1/2,1/3 or
1/4 etc..
Cause positioning inaccurate because the precision of auditory localization technology is not high or is easily subject to noise jamming, the present invention is implemented
Example utilizes auditory localization technology combination presetting bit or image recognition technology, can accurately determine the position of talker, and then control
Camera head is shot.It should be noted that can only use auditory localization technology according to actual conditions in the present invention, or make
With auditory localization technology combination presetting bit, or use auditory localization technology set presetting bit, auditory localization technology can also be used
In combination with presetting bit and image recognition technology.
S12, when current speakers are changed to the second talker from first talker, the second camera head of control is clapped
The video of the second talker is taken the photograph, wherein, second talker is the next speeches different from first speaker location
Person.
Current speakers refer to the current people for talking in local meeting-place, in step S11, S12, current speakers point
It is not first talker and second talker.It should be noted that after speaker location changes and imaging
Device is successfully obtained after change before the video of talker, although the camera head not yet successfully obtained and talk after the change
The video of person, but, in the process, current speakers have been talkers after the change.
It is similar with the video that the first camera head of the control shoots the first talker, can first according to auditory localization technology
Identify that the position of talker is changed, i.e. talker is changed to position and is said different from described first from first talker
Second talker of words person, and then control the second camera head rotation/push-and-pull to suitable shooting angle and shoot size.
Then, as step S11, with reference to presetting bit or image recognition technology, second shooting is further controlled as desired
Device rotation/push-and-pull camera, shoots the sizeable video of the second talker.
If it should be noted that talker simply somewhat moves, such as only moving one, two distances of bodies position, can be with
Think that the position of talker is not changed, it is not necessary to switch camera head, and, as long as talker is still within shooting picture
In the setting regions in face, such as account in the central area of whole picture 80%, camera head enters also without rotation/push-and-pull camera
Line trace.If talker there occurs walking about, as long as talker is still within the setting regions of shooting picture, it is believed that speech
Do not change the position of person, it is not necessary to switch camera head, camera head also without rotation/push-and-pull camera carry out with
Track.If talker is changed to another talker, but, two talkers are that speech occurs on same position to replace, or
Person, the distance of two talkers is close, is in together in the setting regions of a filming apparatus shooting picture, then it is considered that talker
Position do not change, it is not necessary to switch camera head, camera head also without rotation/push-and-pull camera carry out with
Track(Reference picture 2A, solid line represents shooting picture, and dotted line represents setting regions).Whether same talker or different speeches
Person, if speaker location need not switch camera head, but can be slight in picture is exported but not in setting regions
Ground rotation/push-and-pull camera so that the talker after change is in the middle part of picture(Reference picture 2B).In discussion below,
Unless otherwise indicated, the position change of the change of talker or talker refers both to the position of talker and changes, and changes
The distance between rear position and shooting picture center have reached the degree for needing to switch camera head, and the degree can be with
Set according to actual concrete scene(Reference picture 2C).
S13, when subsequently there is talker's change again, controls first camera head and the second shooting dress successively
Put the video of reverse shot current speakers.
Specifically, when follow-up talker is changed to next speech of second talker from second talker
During three talker of person-the, first camera head is controlled to shoot the video of the 3rd talker.If talking again afterwards
Person changes, i.e., talker is changed to the talker of next talker-the four of the 3rd talker from the 3rd talker,
Second camera head is then controlled to shoot the video of the 4th talker.So repeatedly, it is ensured that first camera head
With the video of the second camera head reverse shot current speakers.
For example, if there are first, second, third, four talkers of fourth in local meeting-place, first starts speech, then first controls at first
Make the first camera head and shoot first;When talker is changed to second by first, then the second camera head is controlled to shoot second;Talker afterwards
When being changed to third by second, then the camera head of secondary control first shoots third again;When talker is changed to fourth by third again, then secondary control again
Make the second camera head and shoot fourth, so repeatedly.
When many people rapidly alternately talk in meeting-place, the camera head that prior art is used to shoot talker's video shoots
Picture multiple talkers can be included, if the multiple talker is distant, cannot be in captured picture
The expression of the multiple talker is observed, causes the valuable information loss of meeting.The present invention it is quite different, the first camera head and
Second camera head can follow the trail of talker, wherein, when a camera head follows the trail of current speakers, another camera head is chased after
Talker after track change.So, it is ensured that the first camera head and the second camera head cooperate, slitless connection:
When first camera head shoots current speakers, next speech of the current speakers is shot using the second camera head
Person;When the second camera head shoots current speakers, the next of the current speakers is shot using the first camera head
Talker.Especially when there was only two talkers of first, second in local meeting-place, the first camera head can keep track up
First, the second camera head can keep track up second, if talker alternately talks, because the first camera head and second is taken the photograph
As device has all adjusted focal length respectively, thus eliminate the process of rotation/push-and-pull camera.So, even if in meeting-place
In there is talker and rapidly alternately talk, two camera heads are also capable of the facial picture of reverse shot talker, more protect
The valuable information of meeting, and the efficiency of video frequency tracking is stayed also to be improved.
S14, after the video for successfully obtaining the current speakers, exports the video of the current speakers.
Specifically, the camera head for shooting the current speakers is successfully getting the video of the current speakers
Afterwards, the video of the current speakers is exported, the video of the output current speakers is included in the camera head
Display screen or local meeting-place display screen in a different manner(I.e. full frame, picture-in-picture, double pictures etc.)Exported, also wrapped
Include output in a different manner to remote site.It should be noted that the present invention is for captured video in local meeting-place
By which kind of mode(For example encode, decode etc.)Remote site is sent to not limit.During remote site is sent to,
For example the video of the current speakers can be sent to video signal preprocessor, video signal preprocessor receives described current
After the video of talker, the treatment such as compression coding is carried out, the code stream that then will be obtained after the compression coding is passed by network
It is sent to remote site;After remote site receives the code stream, carry out the treatment such as decoding, obtain regarding for the current speakers
Frequently, then can be shown on the display screen of remote site in a different manner.So, the participant of remote site can
To watch the picture in local meeting-place on the display screen.
When talker changes, the process that camera head obtains talker's video after changing needs the regular hour.
In the meantime, picture first can be switched to prior art the panorama in meeting-place, the talker after camera head successfully obtains change
Video when, just by picture be switched to change after talker, can so cause video not smooth.In embodiments of the present invention,
Before the video that step S14 successfully obtains the current speakers, the side of control video capture provided in an embodiment of the present invention
Method may also include:Export the video of the previous talker of the current speakers.That is, the current speakers are successfully being obtained
Video before, export the video of the previous talker of the current speakers;Successfully obtaining the current speakers'
After video, the video of the current speakers is exported.So, in full frame output picture, can not only ensure to export picture
Continuously, but also can ensure that output image quality is higher, it is to avoid camera head is obtaining the video of the current speakers
During, cause the picture of output the phenomenons such as obscure, rock occur because of camera head rotation/push-and-pull camera.
Certainly, in embodiments of the present invention, when the picture in the local meeting-place is exported, not only can with full frame output, and
And can also be exported in forms such as picture-in-picture, double pictures.When being exported in the form of picture-in-picture, successfully obtaining described
After the video of current speakers, can in the big picture (the first picture) the output current speakers, and in small picture (the
Two pictures) the middle previous talker for exporting the current speakers.When being exported using double visual formats, institute is successfully being obtained
State after the video of current speakers, described working as can be exported in a portion picture of two parts picture not included mutually
Preceding talker, and the previous talker of the current speakers is exported in another part picture.On these output forms
Specific implementation will be below specific embodiment in introduce respectively.
Further, in embodiments of the present invention, for the ease of control two camera heads shoot in turn current speakers and
The video of the current speakers is exported, two camera heads can be respectively before starting to shoot and tracking mark is set, for example
Initial tracking mark can be respectively provided with for the first tracking mark for first camera head and second camera head
With the second tracking mark, the tracking mark can be represented using 0 or 1 grade numeral.Tracking mark can be set for first chases after
The camera head of track mark sets the shooting that tracking mark is the second tracking mark dedicated for the video of shooting current speakers
Device is dedicated for shooting next talker of the current speakers(Or previous talker)Video.And, into
After work(obtains the video of the current speakers, the tracking mark of first camera head and second camera head needs
Exchange.
In the case where tracking mark is set for the first camera head and the second camera head, step S11 is in the first speech
When person talks, the video that the first camera head of control shoots the first talker may include:In the first speaker, control tool
The first camera head for having the first tracking mark goes to shoot the video of the first talker, is successfully obtaining first talker's
After video, the tracking mark of first camera head is set to second tracking mark from first tracking mark,
The tracking mark of second camera head is set to first tracking mark from second tracking mark simultaneously.
When current speakers are changed to the second talker from first talker, the shooting of control second is filled step S12
The video for putting the second talker of shooting may include:When current speakers are changed to the second talker from first talker,
Second camera head of the control with first tracking mark goes to shoot the video of the second talker, is successfully obtaining described the
After the video of two talkers, the tracking mark of second camera head is set to described second from first tracking mark
Tracking mark, while the tracking mark of first camera head is set into described first from second tracking mark chasing after
Track mark.
Step S13 controls first camera head and described second to take the photograph successively when subsequently there is talker's change again
As the video of device reverse shot current speakers may include:Follow-up when talker occurring every time changing, control has described the
The camera head of one tracking mark goes to shoot the video of current speakers, after the video for successfully obtaining current speakers, by institute
The tracking mark for stating the first camera head and second camera head is exchanged.In such manner, it is possible to ensure two camera head phase interworkings
Conjunction, slitless connection, the video of current speakers described in reverse shot.
In the embodiment of the present invention, the first camera head and the second camera head can follow the trail of talker.In the first speech
When person talks, first camera head is controlled to shoot first talker, at the same time, second camera head is just located
In the armed state for preparing next talker of the first talker described in track up.Said by described first in current speakers
Words person is changed to the second talker(Next talkers i.e. different from first speaker location)When, then control described
Two camera heads shoot second talker, and at the same time, first camera head keeps shooting first talker,
And it is changed into the state for preparing the track up next talker different from second speaker location.So, Ke Yibao
Demonstrate,prove the first camera head and the second camera head can cooperate, slitless connection.Due to when talker changes, imaging
The process that device successfully obtains the video of the talker after change needs the regular hour.In the meantime, prior art is due to adopting
With a camera head dedicated for shooting the panorama in local meeting-place, another camera head dedicated for track up talker, because
This, it is necessary to first before the video that the camera head dedicated for track up talker successfully obtains current speakers
Picture is switched to the panorama in meeting-place, when camera head successfully obtains the video of current speakers, picture is just switched to change
Talker after more, can so cause video not smooth.And in technical scheme provided by the present invention, only camera head into
Work(is obtained after the video of current speakers, just exports the video of the current speakers, is successfully obtained currently in camera head
Before the video of talker, the video of the previous talker of the output current speakers is kept.So, relative to existing skill
Art needs first to be switched to the panorama in local meeting-place before camera head successfully obtains the video of next talker, and the present invention is really
Video switching times can be reduced in fact, so that picture linking is tight, the video of output is more smooth.And, when local meeting-place
In many people rapidly alternately talk when, according to prior art shoot picture multiple talkers can be included, if described many
Individual talker is distant, then the expression of the multiple talker cannot be observed in captured picture.In the present invention, by
In the mutual cooperation of first camera head and second camera head, even if it is fast to there is talker in local meeting-place
Alternately talk fastly, two camera heads are also capable of the facial picture of reverse shot talker.
To more fully understand the present invention, referring to Fig. 3 A to Figure 10, then come to this hair by taking several specific embodiments as an example
It is bright to be further described.It is also noted that, it is set forth below for embodiment be a part of embodiment of the invention, this area skill
Art personnel can be easy to expect other embodiment that they are within by content of the present invention.
In following specific embodiment, it is possible to use tracking mark is marked to camera head, and export specified chasing after
Video captured by the camera head of track mark.For example, the initial tracking mark of the first camera head can be set into 0
(That is the first tracking mark), the initial tracking mark of the second camera head is set to 1(That is the second tracking mark), wherein, chase after
The camera head that track is masked as 0 is used to shoot the video of current speakers;Tracking mark is that 1 camera head is used to shoot current
The video of next talker of talker, illustrates infra for simplicity as example.Certainly, by the first camera head
Tracking mark be set to 1, the tracking mark of the second camera head is set to 0, or other set the mode of tracking marks
It is possible, this is not limited by the present invention.
Fig. 3 A are the flow charts of a specific embodiment of the method for present invention control video capture.Fig. 3 B are controlled for the present invention
Another flow chart of one specific embodiment of the method for video capture.
As shown in Figure 3A, so that camera head is video camera as an example, the control video capture that the specific embodiment of the invention is provided
Method include:
S31, when meeting starts, two video cameras of control shoot the panorama in local meeting-place.
In two video cameras(First video camera and the second video camera)After unlatching, i.e., when meeting starts, local meeting
Field also there is no people to talk, and in order to the deployment scenarios in local meeting-place are sent into remote site, can control two video cameras
The panorama in local meeting-place is shot, the angle of shooting and big I are set by user, it can include to own to be preferable to provide
Participant and the setting of main conference scenario.When the picture that video camera shoots is sent into remote site from local meeting-place, due to
What now two video cameras shot is the panorama in local meeting-place, thus can transmit the picture that any one video camera shoots,
Preferably first transmission tracking mark is 1 video camera(That is the second video camera)The picture of shooting.
S32, using auditory localization technology, the first video camera of control shoots the video of first talker.
After the panorama for controlling two video cameras to shoot meeting-place, when there is a people to start speech in meeting-place, i.e.,
When there is the first talker, pickup microphone array picks up the sound in local meeting-place, and the sound is sent into auditory localization
Device, the auditory localization device produces speaker location's information according to auditory localization technology.Then, controller is believed according to the position
Breath controls the video camera that tracking mark is 0 to shoot the sizeable video of the first talker.The tracking mark is 0 to take the photograph
Camera(That is the first video camera)After photographing the sizeable video of the first talker, its tracking mark is set to 1, another
Platform video camera(That is the second video camera)Tracking mark be set to 0 by 1.
S33, when current speakers are changed to the second talker from first talker, controls second video camera
The video of second talker is shot, wherein, second talker is different from first speaker location next
Individual talker.
After first video camera photographs the sizeable video of the first talker, first video camera
Tracking mark become for 1, the tracking mark of second video camera becomes for 0.Afterwards, if the position of talker becomes
Change, i.e., second talkers different from first speaker location are changed to by first talker, controller can
With the video camera for controlling the tracking mark to be 0(I.e. described second video camera)Remove to shoot the video of second talker, control
Make the same S32 of method for shooting.When the video camera that the tracking mark is 0 photographs the sizeable video of the second talker
Afterwards, its tracking mark is set to 1, and another tracking mark of video camera is then set to 0 by 1.
S34, when subsequently there is talker's change again, controls first video camera and second video camera to hand over successively
For the video for shooting current speakers.
After second video camera photographs the sizeable video of the second talker, second video camera
Tracking mark become for 1, the tracking mark of first video camera becomes for 0.Afterwards, if talker is again by described second
Talker is changed to the 3rd talker(Next talker of i.e. described second talker), then it is 0 to take the photograph to control tracking mark
Camera(I.e. described first video camera)Go to shoot the 3rd talker, when the video camera that the tracking mark is 0 successfully obtains described
After the video of the 3rd talker, the tracking mark is that the tracking mark of 0 video camera is set to 1 by 0, another video camera
Tracking mark is set to 0 by 1.Similarly, when talker is changed to the 4th talker by the 3rd talker(3rd speech
Next talker of person)When, then control the video camera that tracking mark is 0(I.e. described second video camera)Go to shoot the described 4th
Talker, after the video camera that the tracking mark is 0 successfully obtains the video of the 4th talker, the tracking mark
For the tracking mark of 0 video camera is set to 1 by 0, the tracking mark of another video camera is set to 0 by 1.So, say every time
When words person changes, the video camera that tracking mark is 0 is controlled(May be specifically the first video camera or the second video camera)Go tracking
The talker after change is shot, and, the video camera is successfully obtained after the video of talker, and its tracking mark is put by 0
It is 1, another tracking mark of video camera is then set to 0 by 1.
S35, it is full frame after the video that the video camera for shooting current speakers' video successfully obtains current speakers
Export the video of the current speakers.
After the video that 0 video camera successfully obtains current speakers is designated, the tracking mark is 0 video camera
Tracking mark be set to 1 by 0, another tracking mark of video camera is then set to 0 by 1.So, tracking mark is 1 after change
The video that video camera shoots is the video of the current speakers.Here, the full frame output current speakers' regards
Frequency refers to that the video of output comes from a video camera.A talker can be only shown in the picture of full screen display, it is also possible to aobvious
Show multiple talkers.Wherein, multiple talkers' is closer to the distance so that according to shoot come video be able to observe that each is said
The body language or facial information of words person.With reference to step S12, if multiple talkers are distant so that cannot be in same shooting
The video that machine shoots observes each talker, then it is considered that the position of talker there occurs change, it is possible to use Ling Yitai
Video camera shoots the video of talker.It is sent to after remote site in full frame form in the video of the current speakers,
The participant of remote site can clearly be observed that the close up view of the current speakers, wherein the close up view may
Comprising important conferencing information, so, can as much as possible retain important conferencing information.
As shown in figure 4, in three width figures from left to right, when the first width figure represents that meeting starts, display full screen display this
The panorama in ground meeting-place;Second width figure is represented, after the first talker occurs, display is displayed in full screen the video of the first talker;The
Three width figures are represented, when talker is changed to after the second talker by the first talker, display is displayed in full screen the second talker.
S36, before the video camera for shooting current speakers' video successfully obtains the video of the current speakers,
Export the video of the previous talker of the current speakers.
It should be noted that step S36 was performed before step S35.
Due to it there is change since talker, the mistake before the video of the current speakers is successfully obtained to video camera
Cheng Zhong, video camera can rotate/push-and-pull camera, thus can produce fuzzy or unstable picture.But, in above process,
By exporting the video of the previous talker of the current speakers, the output fuzzy or unstable picture can be avoided
Face.
For ease of understanding, illustrated below against accompanying drawing 5A and 5B.As shown in Figure 5A, according to order from left to right,
Arrange three width figures and be respectively the first width figure, the second width figure, the 3rd width figure.Under 3rd width figure talker the first width figure talker
One talker, since talker occur change successfully obtained to video camera the 3rd sizeable video of width figure talker it
In preceding process, if the picture that directly output video camera shoots during rotation/push-and-pull camera, just occurs the second width
Obscured or unstable picture in figure.Correspondingly, in above process, output is that the first width figure is said to the specific embodiment of the invention
The video of words person, and only after the sizeable video for successfully obtaining the 3rd width figure talker, just export the 3rd width
Scheme the video of talker, can so avoid the output fuzzy or unstable picture (reference picture 5B).
In addition, according to the situation in local meeting-place, this specific embodiment is likely to occur following several feelings during realization
Condition, corresponding processing mode is as follows:
(1), the unmanned speech in local meeting-place
Do not switch the picture of output, still export the panorama in local meeting-place;
(2), local meeting-place single people speech, nobody chips in
The picture of output is the full screen display picture of current speakers;
(3), local meeting-place single people speech, someone chips in, but the time of chipping in is very short
Do not switch the picture of output, still export the picture of main teller's full screen display;
(4), local meeting-place single people speech, when have movement
If talker's walks about, the skew of head or body exports picture and positioned at the setting of the picture without departing from current
In central area, then video camera does not switch, and does not also track, and the picture of output is complete in central area current speakers
Screen display picture;If the movement of talker causes talker still without departing from current output picture it is likely that or having exceeded this
The setting central area of picture, then video camera does not switch, but can do appropriate tracking, to keep talker to be located at central area
It is interior;If the movement of talker causes talker beyond current output picture, switch video camera, talker is carried out
Tracking;
(5), local meeting-place teller occur once to change, be altered to bystander or other people
If the speaker location after change is without departing from output picture before changing and positioned at the setting central area of the picture
Interior, then video camera does not switch, and does not also track, and it is full frame aobvious in central area that the picture of output is that talker after change is located at
Show picture;If after change the position of talker still without departing from output picture before changing it is likely that or having exceeded the picture
Setting central area, then video camera does not switch, but can do appropriate tracking, with keep change after talker be located at center
In region, the picture of output is the full screen display picture that the talker after change is located in central area;If the speech after change
Person position beyond output picture before changing, then switches video camera, and the talker after change is tracked;
(6), the local many people in meeting-place simultaneously talk, that is, rob speech phase
In this case the time for robbing words is generally very short, and the picture of output is not switched;
(7), many people in local meeting-place discuss, alternately talk, i.e., repeatedly there is the change of teller position
Video camera alternately tracks the teller after each position is changed, and the picture of output is the complete of talker after changing
Screen display picture.
In this specific embodiment, when the position that talker occurs every time is changed, the video camera for controlling tracking mark to be 0 goes
Talker after the change of track up position, and, after the appropriate video that the video camera successfully obtains talker, it is chased after
Track mark is set to 1 by 0, and another tracking mark of video camera is then set to 0 by 1.Thus can always ensure sometime,
There is a video camera to shoot current speakers, while an also other video camera can be used for shooting the current speech
Next talker of person.That is, two video cameras can cooperate, slitless connection.Due in the position of talker
When changing, the process that video camera successfully obtains the video of the talker after change needs the regular hour.In the meantime, protect
The video of the previous talker for exporting the current speakers is held, the video of current speakers is only successfully obtained in video camera
Afterwards, the video of the current speakers is just exported, needs that picture is first switched to the panorama in meeting-place relative to prior art, treated
When video camera successfully obtains the video of the talker after change, picture is just switched to the talker after change, the present invention is really
Video switching times can be reduced, so that picture linking is tight, the video of output is more smooth.And, as many people in meeting-place
When rapidly alternately talking, prior art can talk multiple dedicated for shooting the picture that the video camera of talker's video shoots
Person is included, if the multiple talker is distant, the multiple talker cannot be observed in captured picture
Expression.In the present invention, due to the mutual cooperation of first video camera and second video camera, even if being deposited in meeting-place
Rapidly alternately talked in talker, two video cameras are also capable of the facial picture of reverse shot talker.Additionally, by full frame defeated
Go out the video of the current speakers, the participant of remote site can clearly observe the face of the current speakers
Portion's feature, these facial features may include important conferencing information, so, can more retain valuable meeting letter
Breath.
Fig. 6 is the flow chart of the another specific embodiment of the method for present invention control video capture.
As shown in fig. 6, so that camera head is video camera as an example, the control video capture that the specific embodiment of the invention is provided
Method includes:
S61, when meeting starts, two video cameras of control shoot the panorama in local meeting-place.
After two video cameras unlatching, i.e., when meeting starts, local meeting-place also nobody's speech, in order to by locally
The deployment scenarios in meeting-place are sent to remote site, and two video cameras can be controlled to shoot the panorama in local meeting-place, shooting
Angle and big I are set by user, be preferable to provide can be being capable of setting comprising all participants and main conference scenario
Put, and, when the panoramic video in local meeting-place is exported, preferably first output tracking mark is 1 shot by camera
Video.
S62, with reference to auditory localization technology and presetting bit, the first video camera of control shoots the video of first talker.
After the panorama for controlling two video cameras to shoot meeting-place, when there is a people to start speech in meeting-place, i.e.,
When there is the first talker, using the positional information of the talker of auditory localization technical limit spacing first.In conjunction with presetting bit, that is, combine
Set in advance, talker possible position residing when being talked in local meeting-place, determines the accurate position of first talker
Put.Specifically, the immediate presetting bit in the position obtained with auditory localization can be found out from multiple presetting bits as accurate position
Put.Then, according to the accurate location of first talker, the video camera for controlling tracking mark to be 0 goes shooting first to controller
The video of talker.The tracking mark be 0 video camera photograph the appropriate video of first talker after, its tracking
Mark is set to 1, and the tracking mark of another video camera is set to 0 by 1.
S63, when current speakers are changed to the second talker from first talker, controls second video camera
The video of second talker is shot, wherein, second talker is different from first speaker location next
Individual talker.
After first video camera successfully photographs the video of first talker, first video camera is chased after
Track mark becomes for 1, and the tracking mark of second video camera becomes for 0.Now, if talker changes, i.e., by institute
State the first talker and be changed to second talkers different from first speaker location, as step S62, control
Device processed can control the video camera that the tracking mark is 0(I.e. described second video camera)Go to shoot regarding for second talker
Frequently.After the video camera that the tracking mark is 0 successfully photographs the video of second talker, its tracking mark is set to
1, the tracking mark of another video camera is set to 0 by 1.
S64, when subsequently there is talker's change again, controls first video camera and second video camera to hand over successively
For the video for shooting current speakers.
After second video camera successfully photographs the video of second talker, second video camera is chased after
Track mark becomes for 1, and the tracking mark of first video camera becomes for 0.If talker is become by second talker again
More the 3rd talker, then control the video camera that tracking mark is 0(I.e. described first video camera)Go to shoot the 3rd talker, when
The tracking mark be 0 video camera successfully obtain the appropriate video of the 3rd talker after, the tracking mark is 0
The tracking mark of video camera is set to 1, another video camera by 0(I.e. described second video camera)Tracking mark be set to 0 by 1.Class
As, when talker is changed to the 4th talker by the 3rd talker(Next speech of i.e. described 3rd talker
Person), then control the video camera that tracking mark is 0(I.e. described second video camera)Go to shoot the 4th talker, mark is followed the trail of when described
Will be 0 video camera successfully obtain the appropriate video of the 4th talker after, the tracking mark is chasing after for 0 video camera
Track mark is set to 1, another video camera by 0(I.e. described first video camera)Tracking mark be set to 0 by 1.When subsequently occurring again
When talker changes, reverse shot is done in the same fashion.
S65, after the video that the video camera for shooting current speakers' video successfully obtains current speakers, to draw
The form of middle picture exports the video of the previous talker of the current speakers and the current speakers simultaneously;Wherein, institute
Picture-in-picture is stated including the first picture and the second in first picture, smaller than the first picture picture is included in, described the
The current speakers are exported in one picture, the previous talker of the current speakers is exported in second picture.
After the video that 0 video camera successfully obtains current speakers is designated, the tracking mark is 0 video camera
Tracking mark be set to 1 by 0.Now, tracking mark be 1 video camera shoot be the current speakers video, follow the trail of
Be masked as 0 video camera shoot be the current speakers previous talker video.Here, described with picture-in-picture
Form exports the video of the previous talker of the current speakers and the current speakers simultaneously, refers to described first
The current speakers are exported in picture, in first picture and second picture smaller than the first picture is contained in,
Export the previous talker of the current speakers.So, the participant of remote site is except it is observed that described current
Outside the facial expression of talker, it can also be observed that a side is for the reaction expression that the opposing party makes a speech, these expressions may be wrapped
Contain important conferencing information, so, can as much as possible retain important conferencing information.
As shown in fig. 7, in three width figures from left to right, it is defeated in a form of picture-in-picture when the first width figure represents that meeting starts
Go out the panorama in local meeting-place;Second width figure is represented, after the first talker occurs, in big picture(That is the first picture)Middle output first
Talker, the lower right corner of screen(That is the second picture)Export local meeting-place panorama;3rd width figure is represented, when talker is said by first
Words person is changed to after the second talker, and the second talker is exported in big picture, and the lower right corner of screen exports the first talker.
S66, before the video camera for shooting current speakers' video successfully obtains the video of the current speakers,
Export the first two talker of the current speakers respectively in first picture and the second picture.
It should be noted that step S66 was performed before step S65.
During being altered to the video that video camera successfully obtains the current speakers from talker, shooting
Chance rotation/push-and-pull camera, so as to produce fuzzy or unstable picture.Therefore, can be in first picture and second
The first two talker of the current speakers is exported in picture respectively, can so avoid output described fuzzy or unstable
Picture.
For ease of understanding, illustrated below against accompanying drawing 8A and 8B.As shown in Figure 8 A, according to order from left to right,
Arrange three width figures and be respectively the first width figure, the second width figure, the 3rd width figure.The first width figure lower right corner(That is the second picture)Talker is
The first big picture of width figure(That is the first picture)The previous talker of talker, the big picture talker of the first width figure is the 3rd width figure
The previous talker of big picture talker.Now, talker to be changed to the 3rd width figure by the big picture talker of the first width figure big
Picture talker.Before the video for successfully obtaining the big picture talker of the 3rd width figure to video camera talker occurs change
During, if the picture that directly output video camera shoots during rotation/push-and-pull camera, just occurs the second width figure
Obscured or unstable picture in the picture of the lower right corner.As shown in Figure 8 B, correspondingly, the specific embodiment of the invention is in said process
In, output be the first width figure talker moving frame(The second big picture of width figure)With previous the saying of the first width figure talker
The freeze frame of words person(Second width figure lower right corner picture), the output fuzzy or unstable picture can be avoided.
Certainly, according to actual needs, it is altered to video camera from talker and successfully obtains regarding for the current speakers
During frequency, it would however also be possible to employ the way of output shown in the second width figure of Fig. 8 A.
In addition, according to the situation in local meeting-place, this specific embodiment is likely to occur following several feelings during realization
Condition, corresponding processing mode is as follows:
(1), the unmanned speech in local meeting-place
The combination for exporting picture is constant, still exports the panorama in local meeting-place;
(2), local meeting-place single people speech, nobody chips in
Current speakers are exported in first picture, what the second picture was exported is the previous speech of the current speakers
Person, picture composition mode is constant;
(3), local meeting-place single people speech, someone chips in, but the time of chipping in is very short
Speaker is exported in first picture, the second picture does not switch or export the people that chips in, preferably described second picture
Do not switch;
(4), local meeting-place single people speech, when have movement
If talker's walks about, the skew of head or body is without departing from current the first picture for exporting and positioned at the first picture
Setting central area in, then video camera does not switch, and does not also track, the first picture output be that current speakers have action
Picture, the second picture is constant, and output picture composition mode is constant;If the movement of talker causes that talker is still defeated without departing from current
The first picture for going out it is likely that or exceeded the setting central area of the first picture, then video camera does not switch, but can do
Appropriate tracking, to keep talker to be located in the setting central area of the first picture, the second picture is constant, exports picture composition
Mode is constant;If the movement of talker causes talker beyond the first picture of current output, switch video camera, it is right
Talker is tracked, and exports talker after tracking successfully in the first picture, and the first picture before camera switching is switched to
Second picture is exported;
(5), local meeting-place teller occur once to change, be altered to bystander or other people
If the speaker location after change is without departing from the first picture before changing and positioned at the setting center of the first picture
In domain, then video camera does not switch, and does not also track, the first picture output be change after talker be located at central area in
Picture, the second picture is constant;If change after talker position still without departing from the first picture before changing it is likely that or
Beyond the setting central area of the first picture, then video camera does not switch, but can do appropriate tracking, to keep saying after change
Words person is located in the first picture central area, and the second picture is constant;If the speaker location after change is beyond before changing
The first picture, then switch video camera, the talker after change is tracked, the first picture output change after talker,
Second picture output talker before changing;
(6), the local many people in meeting-place simultaneously talk, that is, rob speech phase
In this case the time for robbing words is generally very short, and the combination for exporting picture is constant;
(7), many people in local meeting-place discuss, alternately talk, i.e., repeatedly there is the change of teller position
Video camera alternately tracks the teller after each position is changed, and changes the combination of output picture, i.e., often
After secondary change, current speakers are exported in first picture, the output of the second picture is the previous of the current speakers
Talker.
In this specific embodiment, when the position that talker occurs every time is changed, the video camera that tracking mark is 0 is controlled
Go track up position change after talker, and, the video camera successfully obtain the sizeable video of talker it
Afterwards, its tracking mark is set to 1 by 0, and another tracking mark of video camera is then set to 0 by 1.Thus can always ensure at certain
At the individual moment, there is a video camera to shoot current speakers, while an also other video camera is in idle condition, can be with
Next talker for shooting the current speakers.That is, two video cameras can cooperate, it is seamless right
Connect.Because when the position of talker is changed, video camera successfully obtains the process needs of the video of the talker after change
Regular hour.In the meantime, keep the video of the previous talker of the output current speakers, only video camera into
Work(is obtained after the video of current speakers, just exports the video of the current speakers, needs first to relative to prior art
Picture is switched to the panorama in meeting-place, during the video of the talker after video camera successfully obtains change, picture just is switched into change
Talker after more, the present invention can actually reduce video switching times, so that picture linking is tight, the video of output is more
It is smooth.And, when in meeting-place many people rapidly alternately talk when, prior art dedicated for shoot talker's video video camera
Can be included for multiple talkers by the picture of shooting, if the multiple talker is distant, cannot be in captured picture
The expression of the multiple talker is observed in face.In the present invention, due to first video camera and second video camera
Cooperate, even if there is talker in meeting-place rapidly alternately talking, two video cameras are also capable of reverse shot talker's
Facial picture.Additionally, exporting the current speakers and the previous of the current speakers is said simultaneously in a form of picture-in-picture
The video of words person so that the participant of remote site can clearly be observed that the facial feature of the current speakers, while
It can also be seen that the situation of talker's change and a side are so, just more for the reaction that the opposing party makes a speech in local meeting-place
Remain valuable conferencing information.
Fig. 9 is the flow chart of the still another embodiment of the method for present invention control video capture.
As shown in figure 9, so that camera head is video camera as an example, the control video capture that the specific embodiment of the invention is provided
Method includes:
S91, when meeting starts, two video cameras of control shoot the panorama in meeting-place.
After two video cameras unlatching, i.e., when meeting starts, local meeting-place also nobody's speech, in order to by locally
The deployment scenarios in meeting-place are sent to remote site, and two video cameras can be controlled to shoot the panorama in local meeting-place, shooting
Angle and big I are set by user, be preferable to provide can be being capable of setting comprising all participants and main conference scenario
Put, when the video of panorama in local meeting-place is exported, preferably first output tracking mark is 1 shot by camera
Video.
S92, using auditory localization technology and image recognition technology, the first video camera of control shoots first talker's
Video.
After the panorama for controlling two video cameras to shoot meeting-place, when there is a people to start speech in meeting-place, i.e.,
When there is the first talker, using the position of the talker of auditory localization technical limit spacing first, the video camera that tracking mark is 0 is controlled
Turn to suitable angle.Image recognition technology is recycled, the accurate location of first talker is determined whether.Then,
According to the accurate location of first talker, the video camera for controlling tracking mark to be 0 goes to shoot the first talker controller
Video.The tracking mark be 0 video camera photograph the appropriate video of first talker after, its tracking mark is set to
1, the tracking mark of another video camera is set to 0 by 1.
S93, when current speakers are changed to the second talker from first talker, controls second video camera
The video of second talker is shot, wherein, second talker is different from first speaker location next
Individual talker.
After first video camera successfully photographs the video of first talker, first video camera is chased after
Track mark becomes for 1, and the tracking mark of second video camera becomes for 0.Now, if talker changes, i.e., by institute
State the first talker and be changed to second talkers different from first speaker location, as step S92, control
Device processed can control the video camera that the tracking mark is 0(I.e. described second video camera)Go to shoot regarding for second talker
Frequently.After the video camera that the tracking mark is 0 photographs the appropriate video of second talker, its tracking mark is set to
1, the tracking mark of another video camera is set to 0 by 1.
S94, when subsequently there is talker's change again, controls first video camera and second video camera to hand over successively
For the video for shooting current speakers.
After second video camera successfully photographs the video of second talker, second video camera is chased after
Track mark becomes for 1, and the tracking mark of first video camera becomes for 0.If talker is become by second talker again
More the 3rd talker, then control the video camera that tracking mark is 0(I.e. described first video camera)Go to shoot the 3rd talker, when
The tracking mark be 0 video camera successfully obtain the appropriate video of the 3rd talker after, the tracking mark is 0
The tracking mark of video camera is set to 1, another video camera by 0(I.e. described second video camera)Tracking mark be set to 0 by 1.Class
As, when talker is changed to the 4th talker by the 3rd talker(Next speech of i.e. described 3rd talker
Person), then control the video camera that tracking mark is 0(I.e. described second video camera)Go to shoot the 4th talker, mark is followed the trail of when described
Will be 0 video camera successfully obtain the appropriate video of the 4th talker after, the tracking mark is chasing after for 0 video camera
Track mark is set to 1, another video camera by 0(I.e. described first video camera)Tracking mark be set to 0 by 1.When subsequently occurring again
When talker changes, reverse shot is done in the same fashion.
S95, after the video that the video camera for shooting current speakers' video successfully obtains current speakers, with double
The form of picture exports the video of the previous talker of the current speakers and the current speakers simultaneously;Wherein, institute
Stating double pictures includes the two part pictures not included mutually, and a part of picture exports the current speakers, and another part picture is defeated
Go out the previous talker of the current speakers.
After the video that 0 video camera successfully obtains current speakers is designated, the tracking mark is 0 video camera
Tracking mark be set to 1 by 0.Now, tracking mark be 1 video camera shoot be the current speakers video, follow the trail of
Be masked as 0 video camera shoot be the current speakers previous talker video.Here, described with double pictures
Form exports the video of the previous talker of the current speakers and the current speakers simultaneously, refers in a picture
The middle output current speakers, export the previous talker of the current speakers, above-mentioned two in another picture
Picture does not include mutually.So, the participant of remote site except it is observed that the current speakers facial expression in addition to,
It can also be observed that a side is for the reaction expression that the opposing party makes a speech, these expressions may include important conferencing information, this
Sample, can as much as possible retain important conferencing information.
As shown in Figure 10, it is defeated in the form of double pictures when the first width figure represents that meeting starts in three width figures from left to right
Go out the panorama in local meeting-place;Second width figure represents, after the first talker occurs, the first talker is exported in the picture of left side, right
Side picture exports local meeting-place panorama;3rd width figure represents that talker is changed to after the second talker by the first talker, right
The second talker is exported in the picture of side, left side picture exports the first talker.
S96, the video camera for shooting current speakers' video successfully obtain the current speakers video it
Before, export the first two talker of the current speakers respectively in described pair of picture.
It should be noted that step S96 was performed before step S95.
Due to it there is change since talker, the mistake that the video of the current speakers terminates successfully is obtained to video camera
Cheng Zhong, video camera can rotate/push-and-pull camera, so as to produce fuzzy or unstable picture.Therefore, dividing in described pair of picture
The first two talker of the current speakers is not exported, can avoid the output fuzzy or unstable picture.
Illustrated with the accompanying drawing 11A and 11B of control below.Such as Figure 11 A, according to order from left to right, arrange three width
Figure is respectively the first width figure, the second width figure, the 3rd width figure.First width figure right panel talker is the first width figure left side talker
Previous talker, the first width figure left side picture talker be the 3rd width figure right panel talker previous talker.
Now, talker is changed to the 3rd width figure right panel talker by the first width figure left side picture talker.Occur from talker
During changing before starting to video camera the appropriate video for successfully obtaining the 3rd width figure right panel talker, if directly
The output picture that is shot during rotations/push-and-pull camera of video camera, just occur it is fuzzy in the second width figure right panel or
Unstable picture.As shown in Figure 11 B, correspondingly, in above process, output is the first width figure to the specific embodiment of the invention
The moving frame of talker(Second width figure right panel)With the freeze frame (of the previous talker of the first width figure talker
Two width figures left side picture), the output fuzzy or unstable picture can be avoided.
Certainly, according to actual needs, it is altered to video camera from talker and successfully obtains regarding for the current speakers
During frequency, it would however also be possible to employ the way of output shown in the second width figure of Fig. 8 A.
In addition, according to the situation in local meeting-place, this specific embodiment is likely to occur following several feelings during realization
Condition, corresponding processing mode is as follows:
(1), the unmanned speech in local meeting-place
The combination for exporting picture is constant, still exports the panorama in local meeting-place;
(2), local meeting-place single people speech, nobody chips in
Current speakers are exported in a part of picture, the output of another part picture is the previous of the current speakers
Talker, picture composition mode is constant;
(3), local meeting-place single people speech, someone chips in, but the time of chipping in is very short
Speaker is exported in a part of picture, another part picture does not switch or export the people that chips in, it is preferably described another
A part of picture does not switch;
(4), local meeting-place single people speech, when have movement
If talker's walks about, the skew of head or body exports picture and positioned at the setting of the picture without departing from current
In central area, then video camera does not switch, and does not also track, and output picture composition mode is constant;If the movement of talker is caused
Talker still without departing from current output picture it is likely that or exceeded the setting central area of current output picture, then
Video camera does not switch, but can do appropriate tracking, and to keep talker to be located in central area, output picture composition mode is not
Become;If the movement of talker causes talker beyond current output picture, switch video camera, talker is carried out
Tracking;
(5), local meeting-place teller occur once to change, be altered to bystander or other people
If latter speaker location exports picture and positioned at the setting center of the picture without departing from previous talker's
In region, then video camera does not switch, and does not also track, and the picture of output is the picture that latter talker is located in central area
Face;If latter position of talker still without departing from previous talker output picture it is likely that or having exceeded the picture
Setting central area, then video camera does not switch, but can do appropriate tracking, with keep latter talker be located at center
In domain, the picture of output is the picture that latter talker is located in central area;If latter speaker location has exceeded
The output picture of previous talker, then switch video camera, and latter talker be tracked;
(6), the local many people in meeting-place simultaneously talk, that is, rob speech phase
In this case the time for robbing words is generally very short, and the combination for exporting picture is constant;
(7), many people in local meeting-place discuss, alternately talk, i.e., repeatedly there is the change of teller position
Video camera alternately tracks the teller after each position is changed, and changes the combination of output picture, i.e., often
After secondary change, current speakers are exported in a part of picture, the output of another part picture is the previous of the current speakers
Individual talker.
In this specific embodiment, when the position that talker occurs every time is changed, the video camera for controlling tracking mark to be 0 goes
Talker after the change of track up position, and, after the video camera successfully obtains the sizeable video of talker,
Its tracking mark is set to 1 by 0, and another tracking mark of video camera is then set to 0 by 1.Thus can always ensure at certain
At the moment, there is a video camera to shoot current speakers, while an also other video camera can be used for shooting described working as
Next talker of preceding talker.That is, two video cameras can cooperate, slitless connection.Due in talker
Position when changing, the process that video camera successfully obtains the video of the talker after change needs the regular hour.At this
Period, the video of the previous talker of the output current speakers is kept, only successfully obtain current speech in video camera
After the video of person, the video of the current speakers is just exported, need that picture first is switched into meeting-place relative to prior art
Panorama, after video camera successfully obtain change after talker video when, just by picture be switched to change after talker, this
Invention can actually reduce video switching times, so that picture linking is tight, the video of output is more smooth.And, when meeting
When many people rapidly alternately talk in, prior art can be by dedicated for shooting picture that the video camera of talker's video shoots
Multiple talkers are included, if the multiple talker is distant, cannot be observed in captured picture described many
The expression of individual talker.In the present invention, due to the mutual cooperation of first video camera and second video camera, even if
There is talker in meeting-place rapidly alternately to talk, two video cameras are also capable of the facial picture of reverse shot talker.Additionally, logical
Cross double pictures form output current speakers and the current speakers previous talker video, remote site with
Meeting person is in addition to it can clearly be observed that current speakers' face feature, it can also be observed that a side couple in local meeting-place
In the reaction of the opposing party's speech(It is adapted to many people's talks, the situation of particularly two people talk), so, just more retain valuable
Conferencing information.
Corresponding with a kind of method for controlling video capture provided in an embodiment of the present invention, the embodiment of the present invention also provides one
Plant the device of control video capture.The device of control video capture provided in an embodiment of the present invention can be by possessing control process work(
Can a class device implement, described device can be for example video camera, Video Controller, video terminal etc..Such as Figure 12
Shown, a kind of device 12 for controlling video capture provided in an embodiment of the present invention includes:
Control unit 121, in the first speaker, the first camera head of control to shoot regarding for the first talker
Frequently;For when current speakers are changed to the second talker from first talker, the second camera head of control shoots the
The video of two talkers, wherein, second talker is the next talkers different from first speaker location;Also
For when subsequently there is talker's change again, controlling first camera head and second camera head alternately to clap successively
Take the photograph the video of current speakers.
Processing unit 122, is connected with described control unit 121, for successfully obtaining the video of the current speakers
The video of the current speakers is exported afterwards.
Wherein, alternatively, in one embodiment, described control unit 121 can be additionally used in:The first camera head is controlled to clap
Before taking the photograph the video of the first talker, in original state, first camera head and second camera head is controlled to clap
Take the photograph the video in whole meeting-place;
The processing unit 122, is additionally operable to captured video frequency output.
Alternatively, in another embodiment, described control unit 121 is additionally operable to:It is first camera head and institute
State the second camera head and be respectively provided with tracking mark, wherein, the tracking mark of first camera head is initially the first tracking
Mark, the tracking mark of second camera head is initially the second tracking mark.
Described control unit 121 specifically for:In the first speaker, control with the first tracking mark first
Camera head goes to shoot the video of the first talker, after the video for successfully obtaining first talker, described first is taken the photograph
As the tracking mark of device is set to second tracking mark from first tracking mark, while the described second shooting is filled
The tracking mark put is set to first tracking mark from second tracking mark.
Described control unit 121 specifically for:In current speakers the second talker is changed to from first talker
When, second camera head of the control with first tracking mark goes to shoot the video of the second talker, is successfully obtaining institute
After stating the video of the second talker, the tracking mark of second camera head is set to from first tracking mark described
Second tracking mark, while the tracking mark of first camera head is set into described first from second tracking mark
Tracking mark.
Described control unit 121 specifically for:During the follow-up talker's change of generation every time, control has described first to follow the trail of
The camera head of mark goes to shoot the video of current speakers, after the video for successfully obtaining current speakers, by described first
The tracking mark of camera head and second camera head is exchanged.
Alternatively, control unit 121 specifically for:Judge second speaker location whether in first talker
Output picture in;If second speaker location is not in the output picture of first talker, described is controlled
Two camera heads shoot the video of second talker;
If second speaker location determines whether described second in the output picture of first talker
Whether speaker location is in the setting regions of the output picture of first talker;If second speaker location is in institute
State in setting regions, then control first camera head to shoot the video of second talker;If second talker
Position does not control the second talker described in the first camera head track up in the setting regions, then, so that described
Second speaker location is in the setting regions.
Alternatively, described control unit 121 can be specifically for:Using auditory localization technology, control camera head shoots to be said
The video of words person.
Further, described control unit 121 can be specifically for:Using auditory localization technology and with reference to presetting bit or figure
As identification technology, control camera head shoots the video of talker.
It should be noted that first camera head and second camera head can be connected by attachment means
Together, it is also possible to separate.
In the present embodiment, when someone starts speech, control unit 121 controls wherein one camera head to shoot current speech
The video of person, processing unit 122 after the video for successfully getting current speakers, by the video frequency output.Now, it is another
Platform camera head is in the armed state for preparing next talker of current speakers described in track up.As follow-up talker
When changing, control unit 121 can immediately control the camera head in the armed state to shoot the current speech
The video of next talker of person.Due to it there is change since the position of talker, the conjunction of talker after being changed to acquisition
The process of suitable video needs the time, and the picture that the present embodiment exports remote site in the meantime need not first be switched to meeting-place
Panorama, but the video of talker before changing is continued to output, in such manner, it is possible to video switching times are reduced, so that picture is connected
Closely, the video of output is more smooth.Being additionally, since control unit 121 controls two camera head reverse shots currently to talk
The video of person, even if there is talker in meeting-place rapidly alternately talking, two camera heads also being capable of reverse shot speech
The facial picture of person, more retains valuable conferencing information.
Alternatively, in another embodiment of the invention, processing unit 122 can be with the full frame output current speakers'
Video.Processing unit 122 specifically for:After the video for successfully obtaining the current speakers, the current speech is set
The video full screen display of person, after accomplishing the setting up, the video of the full frame output current speakers;Described currently said successfully obtaining
Before the video of words person, the video of the previous talker of the full frame output current speakers.
By the video of the full frame output current speakers, the participant of remote site can clearly observe
The facial feature of the current speakers, these facial features may include important conferencing information, so, can be further
Retain valuable conferencing information.
Alternatively, in another embodiment of the present invention, processing unit 122 can simultaneously export institute in a form of picture-in-picture
State the video of the previous talker of current speakers and the current speakers.
Processing unit 122 specifically for:After the video for successfully obtaining the current speakers, setting is described currently to be said
The video of the previous talker of the video of words person and the current speakers is shown in a form of picture-in-picture;Wherein, institute
Picture-in-picture is stated including the first picture and the second in first picture, smaller than first picture picture is included in, in institute
State and show the current speakers in the first picture, show that the previous of the current speakers is said in second picture
Words person;After being provided with, the previous of the current speakers and the current speakers is exported simultaneously in a form of picture-in-picture
The video of talker.
Control unit 121 is additionally operable to:When current speakers are changed to three talkers from second talker, control
First camera head shoots the video of the 3rd talker, wherein, the 3rd talker is and second talker position
Put different next talkers.
Processing unit 122 specifically for:Before the video of the 3rd talker is successfully obtained:In first picture
Middle output second talker, exports the freeze frame of first talker in second picture;Or, described
Second talker is exported in first picture, output has begun to shoot but not yet successfully obtained in second picture
The 3rd talker in journey;After the video for successfully obtaining the 3rd talker:Exported in first picture
3rd talker, exports second talker in second picture.
Export the previous talker's of the current speakers and the current speakers simultaneously in a form of picture-in-picture
Video so that the participant of remote site can clearly be observed that the facial feature of the current speakers, while can be with
See that the situation of talker's change and a side, for the reaction that the opposing party makes a speech, so, just more retain in local meeting-place
Valuable conferencing information.
Alternatively, in one more embodiment of the present invention, processing unit 122 can export institute simultaneously in the form of double pictures
State the video of the previous talker of current speakers and the current speakers.
Processing unit 122 specifically for:After the video for successfully obtaining the current speakers, setting is described currently to be said
The video of the previous talker of the video of words person and the current speakers is shown in the form of double pictures;Wherein, institute
Stating double pictures includes the two part pictures not included mutually, and a part of picture shows the current speakers, and another part picture shows
Show the previous talker of the current speakers;After being provided with, the current speech is exported simultaneously in the form of double pictures
The video of the previous talker of person and the current speakers.
Control unit 121 is additionally operable to:When current speakers are changed to three talkers from second talker, control
First camera head shoots the video of the 3rd talker, wherein, the 3rd talker is and second talker position
Put different next talkers.
Processing unit 122 specifically for:Before the video of the 3rd talker is successfully obtained:Drawn in the part
The freeze frame of first talker is exported in face, second talker is exported in another part picture;Or
Person, output has begun to shoot but the 3rd talker in not yet successful acquisition process in a part of picture,
Second talker is exported in another part picture;After the video for successfully obtaining the 3rd talker:Institute
State and export the 3rd talker in a part of picture, second talker is exported in another part picture.
The video of the previous talker of current speakers and the current speakers is exported by the form of double pictures, far
The participant for holding meeting-place is in addition to it can clearly be observed that current speakers' face feature, it can also be observed that local meeting
A side is for reaction that the opposing party makes a speech in(It is adapted to many people's talks, the situation of particularly two people talk), so, just more
Retain valuable conferencing information.
It is worth noting that, in the device embodiment of above-mentioned control video capture, included unit be according to
What function logic was divided, but above-mentioned division is not limited to, as long as corresponding function can be realized;In addition, each
The specific name of functional unit is also only to facilitate mutually distinguish, the protection domain being not intended to limit the invention.
The other embodiment of the device of present invention control video capture is illustrated referring to Figure 13 A to Figure 13 C.Such as
Shown in Figure 13 A, the device 13 of control video capture provided in an embodiment of the present invention includes:
Controller 131, in the first speaker, control the first photographing module 132 to shoot the first talker's
Video;For when current speakers are changed to the second talker from first talker, the second photographing module 133 of control to be clapped
The video of the second talker is taken the photograph, wherein, second talker is the next speeches different from first speaker location
Person;It is additionally operable to, when subsequently there is talker's change again, control the first photographing module 132 and the second photographing module 133 to hand over successively
For the video for shooting current speakers.
Output processor 134, is connected, with the first photographing module 132 and the second photographing module 133 for successfully obtaining institute
The video of the current speakers is exported after the video for stating current speakers.
The output processor 134 can be integrated in the first photographing module 132 or the second photographing module 133, it is also possible to
Separated with the first photographing module 132 and the second photographing module 133.
Wherein, alternatively, the controller 131 can be additionally used in:The first photographing module 132 is controlled to shoot the first talker's
Before video, in original state, the first photographing module 132 and the second photographing module 133 is controlled to shoot the video in whole meeting-place;
The output processor 134, is additionally operable to the video frequency output in captured whole meeting-place.
First photographing module 132 can be with separate with the second photographing module 133, it is also possible to solid by attachment means connection
It is scheduled on together, forms a double photographing module.First photographing module 132 can be integrated in control and regard with the second photographing module 133
On the device 13 that frequency shoots, it is also possible to which the device 13 with control video capture is separated.
Alternatively, in one embodiment, the controller 131 can be additionally used in:It is first photographing module 132 and institute
State the second photographing module 133 and be respectively provided with tracking mark, wherein, the tracking mark of first photographing module 132 is initially
One tracking mark, the tracking mark of second photographing module 133 is initially the second tracking mark.
The controller 131 specifically for:In the first speaker, control with the first tracking mark first is taken the photograph
As module 132 goes to shoot the video of the first talker, after the video for successfully obtaining first talker, described first is taken the photograph
As the tracking mark of module 132 is set to second tracking mark from first tracking mark, while described second is taken the photograph
As the tracking mark of module 133 is set to first tracking mark from second tracking mark.
The controller 131 specifically for:When current speakers are changed to the second talker from first talker,
Second photographing module 133 of the control with first tracking mark goes to shoot the video of the second talker, is successfully obtaining institute
After stating the video of the second talker, the tracking mark of second photographing module 133 is set to from first tracking mark
Second tracking mark, while the tracking mark of first photographing module 132 is set to from second tracking mark
First tracking mark.
The controller 131 specifically for:When follow-up each generation talker changes, there is the described first tracking to mark for control
The camera head of will goes to shoot the video of current speakers, after the video for successfully obtaining current speakers, described first is taken the photograph
As the tracking mark of module 132 and second photographing module 133 is exchanged.
As shown in Figure 13 B, alternatively, the device 13 of control video capture provided in an embodiment of the present invention also includes:
Pickup microphone array 135, auditory localization device 136, are used for:Using auditory localization technology, the position of talker is obtained
Put, wherein, according to the sound that pickup microphone array 135 is picked up, auditory localization device 136 is determined using auditory localization technology
Position.The position that controller 131 is obtained according to auditory localization, control photographing module shoots the video of talker.
As shown in Figure 13 B, further, the device 13 of control video capture provided in an embodiment of the present invention also includes:Figure
As locator 137, for being determined talker using image recognition technologys such as the dynamic detections of Face datection, Face Detection or lip
Position;Controller 131 can be used for the positional information obtained according to image recognition technology, and control photographing module shoots regarding for talker
Frequently.
Alternatively, controller 131 is obtained according to auditory localization position and presetting bit information, control photographing module shoots to be said
The video of words person.
Alternatively, whether framing device 137 is specifically for judging second speaker location in the described first speech
In the output picture of person;If second speaker location is not in the output picture of first talker, controller 131
The second photographing module 133 is controlled to shoot the video of second talker;
If in the output picture of first talker, framing device 137 enters one to second speaker location
Whether step judges second speaker location in the setting regions of the output picture of first talker;If described second
In the setting regions, then controller 131 controls the first photographing module 132 to shoot second talker's to speaker location
Video;If second speaker location is not in the setting regions, controller 131 control the first photographing module 132 with
Track shoots second talker, so that second speaker location is in the setting regions.
In the present embodiment, when someone starts speech, controller 131 controls wherein the first photographing module 132 to shoot currently to say
The video of words person, output processor 134 gets the video of current speakers, and exports the video.Now, the second photographing module
133 armed states in next talker of current speakers described in track up is prepared.When follow-up talker becomes
When more, controller 131 can immediately control the second photographing module 133 in the armed state to shoot the current speakers
Next talker video.Due to it there is change since the position of talker, talker's is suitable after being changed to acquisition
The process of video needs the time, and the picture that the present embodiment exports remote site in the meantime need not first be switched to the complete of meeting-place
Scape, but the video of talker before changing is continued to output, in such manner, it is possible to video switching times are reduced, so that picture linking is tight
Close, the video of output is more smooth.It is additionally, since the two photographing module reverse shot current speakers' of control of controller 131
Video, even if there is talker in meeting-place rapidly alternately talking, two photographing modules are also capable of reverse shot talker's
Facial picture, more retains valuable conferencing information.
Alternatively, in another embodiment of the invention, output processor 134 can be with the full frame output current speakers
Video.Output processor 134 specifically for:After the video for successfully obtaining the current speakers, set described current
The video full screen display of talker, after accomplishing the setting up, the video of the full frame output current speakers;Successfully obtaining described working as
Before the video of preceding talker, the video of the previous talker of the full frame output current speakers.
By the video of the full frame output current speakers, the participant of remote site can clearly observe
The facial feature of the current speakers, these facial features may include important conferencing information, so, can be further
Retain valuable conferencing information.
Alternatively, in another embodiment of the present invention, output processor 134 can be exported simultaneously in a form of picture-in-picture
The video of the previous talker of the current speakers and the current speakers.
Output processor 134 specifically for:After the video for successfully obtaining the current speakers, set described current
The video of the previous talker of the video of talker and the current speakers is shown in a form of picture-in-picture;Wherein,
The picture-in-picture includes the first picture and is included in the second in first picture, smaller than first picture picture,
The current speakers are shown in first picture, shows that the previous of the current speakers is said in second picture
Words person;After being provided with, the previous of the current speakers and the current speakers is exported simultaneously in a form of picture-in-picture
The video of talker.
Controller 131 is additionally operable to:When current speakers are changed to three talkers from second talker, control the
One photographing module 132 shoots the video of the 3rd talker, wherein, the 3rd talker is with second speaker location not
Same next talker.
Output processor 134 specifically for:Before the video of the 3rd talker is successfully obtained:Drawn described first
Second talker is exported in face, the freeze frame of first talker is exported in second picture;Or, institute
State and export second talker in the first picture, output has begun to shoot but not yet successfully obtain in second picture
During the 3rd talker;After the video for successfully obtaining the 3rd talker:It is defeated in first picture
Go out the 3rd talker, second talker is exported in second picture.
Export the previous talker's of the current speakers and the current speakers simultaneously in a form of picture-in-picture
Video so that the participant of remote site can clearly be observed that the facial feature of the current speakers, while can be with
See that the situation of talker's change and a side, for the reaction that the opposing party makes a speech, so, just further protect in local meeting-place
Valuable conferencing information is stayed.
Alternatively, in one more embodiment of the present invention, output processor 134 can simultaneously be exported in the form of double pictures
The video of the previous talker of the current speakers and the current speakers.
Output processor 134 specifically for:After the video for successfully obtaining the current speakers, described working as, is set
The video of the previous talker of the video of preceding talker and the current speakers is shown in the form of double pictures;Its
In, described pair of picture includes the two part pictures not included mutually, and a part of picture shows the current speakers, and another part is drawn
Face shows the previous talker of the current speakers;After being provided with, export described current simultaneously in the form of double pictures
The video of the previous talker of talker and the current speakers.
Controller 131 is additionally operable to:When current speakers are changed to three talkers from second talker, institute is controlled
The video that the first photographing module 132 shoots the 3rd talker is stated, wherein, the 3rd talker is and second talker position
Put different next talkers.
Output processor 134 specifically for:Before the video of the 3rd talker is successfully obtained:In the part
The freeze frame of first talker is exported in picture, second talker is exported in another part picture;Or
Person, output has begun to shoot but the 3rd talker in not yet successful acquisition process in a part of picture,
Second talker is exported in another part picture;After the video for successfully obtaining the 3rd talker:Institute
State and export the 3rd talker in a part of picture, second talker is exported in another part picture.
The video of the previous talker of current speakers and the current speakers is exported by the form of double pictures, far
The participant for holding meeting-place is in addition to it can clearly be observed that current speakers' face feature, it can also be observed that local meeting
A side so, just further retains valuable conferencing information for the reaction that the opposing party makes a speech in.
Below in conjunction with the accompanying drawings by a specific complete embodiment to control video capture provided in an embodiment of the present invention
Device 13 illustrate.As shown in fig. 13 c, the device 13 of control video capture provided in an embodiment of the present invention includes:
Controller 131;First photographing module 132, initial tracking mark is set to 0;Second photographing module 133, it is initial
Tracking mark is set to 1;Output processor 134;Pickup microphone array 135;Auditory localization device 136;Framing device 137;It is main
Control module 138;Video module 139;Video signal preprocessor 140;Audio-frequency module 141;Audio signal processor 142;Pickup wheat
Gram wind 143;Loudspeaker 144;Display 145.Above-mentioned various pieces can divide with an integrated complete device, or mutually
From part, and the co-ordination under the control of controller 131 and main control module 138.
After the device 13 of control video capture is opened, i.e., when meeting starts, the also nobody's speech of local meeting-place is
The deployment scenarios in local meeting-place are sent to remote site, controller 131 can control described two photographing modules to shoot meetings
The panorama of field.After the video that photographing module photographs local meeting-place, it is preferable that using the video letter in video module 139
The video that 140 pairs of the second photographing modules 133 of number processor shoot carries out the treatment such as encoding and decoding, and in the control of main control module 138
Under, the video is sent to remote site by network.
When there is a people to start speech in local meeting-place, that is, when there is the first talker, pickup microphone array 1,350
The sound in local meeting-place is taken, the sound in the local meeting-place is sent to auditory localization device 136, wherein, the local meeting-place
Sound can pass through by the internal module of audio-frequency module 141 during auditory localization device 136 is sent to(For example have pre-
The module of processing function)To its carry out denoising etc. treatment after, be then forwarded to auditory localization device 136.Auditory localization device 136
According to the positional information that auditory localization is produced, controller 131 obtains the positional information that auditory localization device 136 is produced, control first
Photographing module 132(I.e. tracking mark is 0 photographing module)Suitable angle is turned to, it is rough to obtain regarding for the first talker
Frequently.Then, the video of the first talker that framing device 137 is obtained according to the first photographing module 132, using image recognition skill
Art determines the accurate location of first talker(Including facial positions).Under the control of controller 131, the first photographing module
132(I.e. tracking mark is 0 photographing module)Rotation/push-and-pull camera, shoots the appropriate video of first talker.First
After the video for successfully photographing first talker, its tracking mark puts 1, the second photographing module to photographing module 132 by 0
133 tracking mark is set to 0 by 1.
In the first photographing module 132 after the video for successfully photographing first talker, if talker occurs
Change, i.e., be changed to second talker by first talker, and it is 0 that controller 131 can control the tracking mark
Photographing module(That is the second photographing module 133)Remove to shoot the video of second talker, control the method for shooting ibid.When
Second photographing module 133 is photographed after the appropriate video of second talker, and its tracking mark is set to 1, the first shooting by 0
The tracking mark of module 132 is then set to 0 by 1.
As described above, when there is talker's change every time, controller 131 controls the shooting mould that tracking mark is 0
Block(May be specifically the first photographing module 132 or the second photographing module 133)Talker after going track up to change, and,
After the appropriate video that the photographing module successfully shoots talker, its tracking mark is set to 1 by 0, another shooting mould
The tracking mark of block is then set to 0 by 1.
After the video that photographing module successfully shoots talker, output processor 134 obtains described from photographing module
The video of talker.After the video for getting the talker, output processor 134 can set the way of output of video,
The video of the talker for getting can be exported in the mode such as full frame, picture-in-picture or double pictures.
The video of the talker is sent to video by output processor 134 after the completion of the way of output for setting video
Signal processor 140, is carried out the treatment such as encoding by 140 pairs of videos of the talker of video signal preprocessor.Then, in master control
Under the control of module 138, the video of the talker is sent to distal end meeting by network since video signal preprocessor 140
.
Further, before the video that photographing module successfully obtains current speakers, main control module 138 can control defeated
Go out the video that processor 134 exports the previous talker of the current speakers.
In addition, audio signal processor 142 is used for the sound of the talker in the local meeting-place picked up to pickup microphone 143
Sound carries out the treatment such as encoding, it is necessary to explanation, the purposes of the sound of the pickup of pickup microphone 143 is different from pickup microphone battle array
The sound of the pickup of row 135, the video that the former is used to be shot with photographing module is sent to remote site together, and the latter is fixed for sound source
Position.Loudspeaker 144 and display 145 are all the basic configuration of the device 13 for controlling video capture, are respectively used in local meeting-place
Middle output Voice & Video.
Each embodiment in this specification is described with having stressed, identical similar portion between each embodiment
Divide mutually referring to what each embodiment was stressed is the difference with other embodiment.Especially for device reality
Apply for example, because it is substantially similar to embodiment of the method, so describing fairly simple, related part is referring to embodiment of the method
Part explanation.
It should be noted that, device embodiment described above is only schematical, wherein described as separating component
The unit of explanation can be or may not be physically separate, and the part shown as unit can be or can also
It is not physical location, you can with positioned at a place, or can also be distributed on multiple NEs.Can be according to reality
Selection some or all of module therein is needed to realize the purpose of this embodiment scheme.In addition, the device that the present invention is provided
In embodiment accompanying drawing, the annexation between module is represented has communication connection between them, specifically can be implemented as one or
A plurality of communication bus or holding wire.Those of ordinary skill in the art are without creative efforts, you can to understand
And implement.
It will be recognized by those of ordinary skill in the art that the possibility implementation of various aspects of the invention or various aspects
System, method or computer program product can be embodied as.Therefore, each aspect of the present invention or various aspects
Possible implementation can using complete hardware embodiment, complete software embodiment (including firmware, resident software etc.), or
The form of the embodiment of person's integration software and hardware aspect, collectively referred to herein as " circuit ", " module " or " system ".This
Outward, the possibility implementation of each aspect of the present invention or various aspects can be in the form of computer program product, computer
Program product refers to computer readable program code of the storage in computer-readable medium.
Computer-readable medium can be computer-readable signal media or computer-readable recording medium.Computer can
Read storage medium including but not limited to electronics, magnetic, optics, electromagnetism, infrared or semiconductor system, equipment or device, or
Foregoing is any appropriately combined, such as random access memory (RAM), read-only storage (ROM), the read-only storage of erasable programmable
Device (EPROM or flash memory), optical fiber, portable read-only storage (CD-ROM).
Processor in computer reads computer readable program code of the storage in computer-readable medium so that place
Reason device is able to carry out function action specified in the combination of each step or each step in flow charts;Generation is implemented in block diagram
Each piece or each piece of combination specified in function action device.
Computer readable program code can perform completely on the computer of user, partly hold on the computer of user
Row, as single software kit, partly on the computer of user and part on the remote computer, or completely long-range
Performed on computer or server.It is also noted that in some alternate embodiments, each step or frame in flow charts
Each piece of function of indicating may not be occurred by the order indicated in figure in figure.For example, depending on involved function, show in succession
Two steps or two blocks for going out may actually be executed substantially concurrently, or these blocks sometimes may be by with opposite suitable
Sequence is performed.
The above, specific embodiment only of the invention, but protection scope of the present invention is not limited thereto, and it is any
Those familiar with the art the invention discloses technical scope in, change or replacement can be readily occurred in, should all contain
Cover within protection scope of the present invention.Therefore, protection scope of the present invention should be defined by scope of the claims.
Claims (24)
1. it is a kind of control video capture method, it is characterised in that including:
In the first speaker, the first camera head of control shoots the video of the first talker;
When current speakers are changed to the second talker from first talker, the second camera head of control shoots second and says
The video of words person, wherein, second talker is the next talkers different from first speaker location;
When subsequently there is talker's change again, first camera head and second camera head is controlled alternately to clap successively
Take the photograph the video of current speakers;
After the video for successfully obtaining the current speakers, the video of the current speakers is exported;
Wherein, it is described when current speakers are changed to the second talker from first talker, control the second camera head
The video for shooting the second talker includes:
Judge second speaker location whether in the output picture of first talker;
If second speaker location controls second camera head not in the output picture of first talker
Shoot the video of second talker;
If second speaker location is in the output picture of first talker, second speech is determined whether
Whether person position is in the setting regions of the output picture of first talker;
If second speaker location is in the setting regions, controls first camera head to shoot described second and say
The video of words person;
If second speaker location is not in the setting regions, control described in the first camera head track up
Second talker, so that second speaker location is in the setting regions.
2. method according to claim 1, it is characterised in that the video of the output current speakers includes:Entirely
The video of the screen output current speakers.
3. method according to claim 2, it is characterised in that the video bag of the full frame output current speakers
Include:
Before the video of the current speakers is successfully obtained, the previous talker's of the full frame output current speakers
Video;
After the video for successfully obtaining the current speakers, the video of the full frame output current speakers.
4. method according to claim 1, it is characterised in that the video of the output current speakers includes:With
The form of picture-in-picture exports the video of the previous talker of the current speakers and the current speakers simultaneously;
Wherein, the picture-in-picture includes the first picture and is included in first picture, smaller than first picture the
Two pictures, the current speakers are exported in first picture, and the current speakers are exported in second picture
Previous talker.
5. method according to claim 4, it is characterised in that methods described also includes:
When current speakers are changed to three talkers from second talker, first camera head is controlled to shoot the
The video of three talkers, wherein, the 3rd talker is the next talkers different from second speaker location;
The previous talker's for exporting the current speakers and the current speakers simultaneously in a form of picture-in-picture
Video includes:
Before the video of the 3rd talker is successfully obtained:Second talker is exported in first picture,
The freeze frame of first talker is exported in second picture;Or, export described second in first picture
Talker, output has begun to shoot but the 3rd talker in not yet successful acquisition process in second picture;
After the video for successfully obtaining the 3rd talker:The 3rd talker is exported in first picture,
Second talker is exported in second picture.
6. method according to claim 1, it is characterised in that the video of the output current speakers includes:With
The form of double pictures exports the video of the previous talker of the current speakers and the current speakers simultaneously;
Wherein, described pair of picture includes the two part pictures not included mutually, and a part of picture exports the current speakers, another
Part picture exports the previous talker of the current speakers.
7. method according to claim 6, methods described also includes:
When current speakers are changed to three talkers from second talker, first camera head is controlled to shoot the
The video of three talkers, wherein, the 3rd talker is the next talkers different from second speaker location;
The previous talker's for exporting the current speakers and the current speakers simultaneously in the form of double pictures
Video includes:
Before the video of the 3rd talker is successfully obtained:Export first talker's in a part of picture
Freeze frame, exports second talker in another part picture;Or, exported in a part of picture
The 3rd talker through starting to shoot but in the acquisition process that not yet succeeds, exports described the in another part picture
Two talkers;
After the video for successfully obtaining the 3rd talker:The 3rd talker is exported in a part of picture,
Second talker is exported in another part picture.
8. method according to claim 1, it is characterised in that shoot the first talker in the first camera head of the control
Video before, methods described also includes:
In original state, the video in control first camera head and the whole meeting-place of second camera head shooting simultaneously will
Captured video frequency output.
9. the method according to claim 1-8 any one, it is characterised in that shot in the first camera head of the control
Before the video of the first talker, methods described also includes:
It is that first camera head and second camera head are respectively provided with tracking mark, wherein, the first shooting dress
The tracking mark put is initially the first tracking mark, and the tracking mark of second camera head is initially the second tracking mark;
It is described in the first speaker, control the first camera head shoot the first talker video include:Said first
When words person talks, first camera head of the control with the first tracking mark goes to shoot the video of the first talker, is successfully obtaining
After taking the video of first talker, the tracking mark of first camera head is set to from first tracking mark
Second tracking mark, while the tracking mark of second camera head is set to from second tracking mark described
First tracking mark;
It is described when current speakers are changed to the second talker from first talker, the second camera head of control shoots the
The video of two talkers includes:When current speakers are changed to the second talker from first talker, control has institute
The second camera head for stating the first tracking mark goes to shoot the video of the second talker, is successfully obtaining second talker's
After video, the tracking mark of second camera head is set to second tracking mark from first tracking mark,
The tracking mark of first camera head is set to first tracking mark from second tracking mark simultaneously.
10. method according to claim 9, it is characterised in that
It is described when subsequently there is talker again and change, control first camera head and second camera head to hand over successively
Include for the video for shooting current speakers:During the follow-up talker's change of generation every time, control has first tracking mark
Camera head go shoot current speakers video, successfully obtain current speakers video after, will described first image
The tracking mark of device and second camera head is exchanged.
11. methods according to claim 10, it is characterised in that the video that control camera head shoots talker includes:
Using auditory localization technology, control camera head shoots the video of talker.
12. methods according to claim 11, it is characterised in that the utilization auditory localization technology, control camera head
The video for shooting talker includes:
Using auditory localization technology and with reference to presetting bit or image recognition technology, control camera head shoots the video of talker.
A kind of 13. devices of control video capture, it is characterised in that including:
Control unit, in the first speaker, the first camera head of control to shoot the video of the first talker;
Described control unit, is additionally operable to when current speakers are changed to the second talker from first talker, control the
Two camera heads shoot the video of the second talker, wherein, second talker is different from first speaker location
Next talker;
Described control unit, is additionally operable to, when subsequently there is talker's change again, first camera head and institute be controlled successively
State the video of the second camera head reverse shot current speakers;
Processing unit, is connected with described control unit, for exporting institute after the video for successfully obtaining the current speakers
State the video of current speakers;
Wherein, described control unit specifically for:
Judge second speaker location whether in the output picture of first talker;
If second speaker location controls second camera head not in the output picture of first talker
Shoot the video of second talker;
If second speaker location is in the output picture of first talker, second speech is determined whether
Whether person position is in the setting regions of the output picture of first talker;
If second speaker location is in the setting regions, controls first camera head to shoot described second and say
The video of words person;
If second speaker location is not in the setting regions, control described in the first camera head track up
Second talker, so that second speaker location is in the setting regions.
14. devices according to claim 13, it is characterised in that the processing unit specifically for:
The video full screen display of the current speakers is set;
The video of the full frame output current speakers.
15. devices according to claim 14, it is characterised in that the processing unit specifically for:
Before the video of the current speakers is successfully obtained, the previous talker's of the full frame output current speakers
Video;
After the video for successfully obtaining the current speakers, the video of the full frame output current speakers.
16. devices according to claim 13, it is characterised in that the processing unit specifically for:
The video of the video of the current speakers and the previous talker of the current speakers is set with the shape of picture-in-picture
Formula is shown;
Wherein, the picture-in-picture includes the first picture and is included in first picture, smaller than first picture the
Two pictures, the current speakers are shown in first picture, and the current speakers are shown in second picture
Previous talker;
Export the video of the previous talker of the current speakers and the current speakers simultaneously in a form of picture-in-picture.
17. devices according to claim 14, it is characterised in that described control unit is additionally operable to:
When current speakers are changed to three talkers from second talker, first camera head is controlled to shoot the
The video of three talkers, wherein, the 3rd talker is the next talkers different from second speaker location;
The processing unit specifically for:
Before the video of the 3rd talker is successfully obtained:Second talker is exported in first picture,
The freeze frame of first talker is exported in second picture;Or, export described second in first picture
Talker, output has begun to shoot but the 3rd talker in not yet successful acquisition process in second picture;
After the video for successfully obtaining the 3rd talker:The 3rd talker is exported in first picture,
Second talker is exported in second picture.
18. devices according to claim 13, it is characterised in that the processing unit specifically for:
The video of the video of the current speakers and the previous talker of the current speakers is set with the shape of double pictures
Formula is shown;
Wherein, described pair of picture includes the two part pictures not included mutually, and a part of picture shows the current speakers, another
Part picture shows the previous talker of the current speakers;
Export the video of the previous talker of the current speakers and the current speakers simultaneously in the form of double pictures.
19. devices according to claim 18, it is characterised in that described control unit is additionally operable to:
When current speakers are changed to three talkers from second talker, first camera head is controlled to shoot the
The video of three talkers, wherein, the 3rd talker is the next talkers different from second speaker location;
The processing unit specifically for:
Before the video of the 3rd talker is successfully obtained:Export first talker's in a part of picture
Freeze frame, exports second talker in another part picture;Or, exported in a part of picture
The 3rd talker through starting to shoot but in the acquisition process that not yet succeeds, exports described the in another part picture
Two talkers;
After the video for successfully obtaining the 3rd talker:The 3rd talker is exported in a part of picture,
Second talker is exported in another part picture.
20. devices according to claim 13, it is characterised in that the first camera head of control shoots regarding for the first talker
Before frequency, described control unit is additionally operable to:
In original state, first camera head and second camera head is controlled to shoot the video in whole meeting-place;
The processing unit, is additionally operable to the video frequency output in the whole meeting-place captured by described control unit.
21. device according to claim 13-20 any one, it is characterised in that described control unit is additionally operable to:
It is that first camera head and second camera head are respectively provided with tracking mark, wherein, the first shooting dress
The tracking mark put is initially the first tracking mark, and the tracking mark of second camera head is initially the second tracking mark;
Described control unit specifically for:In the first speaker, first shooting dress of the control with the first tracking mark
The video for shooting the first talker is put, after the video for successfully obtaining first talker, by first camera head
Tracking mark be set to second tracking mark from first tracking mark, while chasing after second camera head
Track mark is set to first tracking mark from second tracking mark;
Described control unit specifically for:When current speakers are changed to the second talker from first talker, control
The second camera head with first tracking mark goes to shoot the video of the second talker, is said successfully obtaining described second
After the video of words person, the tracking mark of second camera head is set to described second from first tracking mark and is followed the trail of
Mark, while the tracking mark of first camera head is set into described first from second tracking mark follows the trail of mark
Will.
22. devices according to claim 21, it is characterised in that described control unit specifically for:It is follow-up to occur every time
When talker changes, camera head of the control with first tracking mark goes to shoot the video of current speakers, in success
After obtaining the video of current speakers, the tracking mark of first camera head and second camera head is exchanged.
23. devices according to claim 22, it is characterised in that described control unit specifically for:
Using auditory localization technology, control camera head shoots the video of talker.
24. devices according to claim 23, it is characterised in that described control unit specifically for:
Using auditory localization technology and with reference to presetting bit or image recognition technology, control camera head shoots the video of talker.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310566974.1A CN103595953B (en) | 2013-11-14 | 2013-11-14 | A kind of method and apparatus for controlling video capture |
PCT/CN2014/074831 WO2015070558A1 (en) | 2013-11-14 | 2014-04-04 | Video shooting control method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310566974.1A CN103595953B (en) | 2013-11-14 | 2013-11-14 | A kind of method and apparatus for controlling video capture |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103595953A CN103595953A (en) | 2014-02-19 |
CN103595953B true CN103595953B (en) | 2017-06-20 |
Family
ID=50085919
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310566974.1A Active CN103595953B (en) | 2013-11-14 | 2013-11-14 | A kind of method and apparatus for controlling video capture |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN103595953B (en) |
WO (1) | WO2015070558A1 (en) |
Families Citing this family (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103595953B (en) * | 2013-11-14 | 2017-06-20 | 华为技术有限公司 | A kind of method and apparatus for controlling video capture |
US9686467B2 (en) * | 2014-08-15 | 2017-06-20 | Sony Corporation | Panoramic video |
CN104486549B (en) * | 2014-12-29 | 2017-07-25 | 中国科学院长春光学精密机械与物理研究所 | A kind of high flux image pickup method for being used to be imaged flow cytometer |
CN105049807B (en) * | 2015-07-31 | 2018-05-18 | 小米科技有限责任公司 | Monitored picture sound collection method and device |
CN105245820B (en) * | 2015-10-12 | 2019-04-02 | 苏州科达科技股份有限公司 | A kind of multiple video strems switching display methods, device and Videoconference Management System |
CN106231234B (en) * | 2016-08-05 | 2019-07-05 | 广州小百合信息技术有限公司 | The image pickup method and system of video conference |
CN107786834A (en) * | 2016-08-31 | 2018-03-09 | 宝利通公司 | For the camera base and its method in video conferencing system |
JP6766086B2 (en) | 2017-09-28 | 2020-10-07 | キヤノン株式会社 | Imaging device and its control method |
CN107820006A (en) * | 2017-11-07 | 2018-03-20 | 北京小米移动软件有限公司 | Control the method and device of camera shooting |
JP2019117375A (en) * | 2017-12-26 | 2019-07-18 | キヤノン株式会社 | Imaging apparatus, control method of the same, and program |
JP7292853B2 (en) | 2017-12-26 | 2023-06-19 | キヤノン株式会社 | IMAGING DEVICE, CONTROL METHOD AND PROGRAM THEREOF |
CN109009170A (en) * | 2018-07-20 | 2018-12-18 | 深圳市沃特沃德股份有限公司 | Detect the method and apparatus of mood |
CN108924469B (en) * | 2018-08-01 | 2020-11-10 | 广州视源电子科技股份有限公司 | Display picture switching transmission system, intelligent interactive panel and method |
CN112333416B (en) * | 2018-09-21 | 2023-10-10 | 上海赛连信息科技有限公司 | Intelligent video system and intelligent control terminal |
TWI678660B (en) * | 2018-10-18 | 2019-12-01 | 宏碁股份有限公司 | Electronic system and image processing method |
CN111212218A (en) * | 2018-11-22 | 2020-05-29 | 阿里巴巴集团控股有限公司 | Shooting control method and device and shooting system |
CN109816722A (en) * | 2019-01-18 | 2019-05-28 | 深圳市沃特沃德股份有限公司 | Position method, apparatus, storage medium and the computer equipment of spokesman position |
CN110072055A (en) * | 2019-05-07 | 2019-07-30 | 中国联合网络通信集团有限公司 | Video creating method and system based on artificial intelligence |
CN110536101A (en) * | 2019-09-29 | 2019-12-03 | 广州视源电子科技股份有限公司 | Electronic platform, video conferencing system and method |
CN112911256A (en) * | 2020-12-29 | 2021-06-04 | 慧投科技(深圳)有限公司 | Projector system with camera for automatically capturing sound source |
CN113596349B (en) * | 2021-07-26 | 2024-06-04 | 世邦通信股份有限公司 | Conference method, system, device and storage medium for automatic linkage video of speaking position |
CN115550559B (en) * | 2022-04-13 | 2023-07-25 | 荣耀终端有限公司 | Video picture display method, device, equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1386371A (en) * | 2000-08-01 | 2002-12-18 | 皇家菲利浦电子有限公司 | Aiming a device at a sound source |
CN102148965A (en) * | 2011-05-09 | 2011-08-10 | 上海芯启电子科技有限公司 | Video monitoring system for multi-target tracking close-up shooting |
CN102256098A (en) * | 2010-05-18 | 2011-11-23 | 宝利通公司 | Videoconferencing endpoint having multiple voice-tracking cameras |
CN102843540A (en) * | 2011-06-20 | 2012-12-26 | 宝利通公司 | Automatic camera selection for videoconference |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8289363B2 (en) * | 2006-12-28 | 2012-10-16 | Mark Buckler | Video conferencing |
CN103281492A (en) * | 2013-05-23 | 2013-09-04 | 深圳锐取信息技术股份有限公司 | Video picture switching method, video picture switching system, recording and broadcasting server and video recording and broadcasting system |
CN103595953B (en) * | 2013-11-14 | 2017-06-20 | 华为技术有限公司 | A kind of method and apparatus for controlling video capture |
-
2013
- 2013-11-14 CN CN201310566974.1A patent/CN103595953B/en active Active
-
2014
- 2014-04-04 WO PCT/CN2014/074831 patent/WO2015070558A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1386371A (en) * | 2000-08-01 | 2002-12-18 | 皇家菲利浦电子有限公司 | Aiming a device at a sound source |
CN102256098A (en) * | 2010-05-18 | 2011-11-23 | 宝利通公司 | Videoconferencing endpoint having multiple voice-tracking cameras |
CN102148965A (en) * | 2011-05-09 | 2011-08-10 | 上海芯启电子科技有限公司 | Video monitoring system for multi-target tracking close-up shooting |
CN102843540A (en) * | 2011-06-20 | 2012-12-26 | 宝利通公司 | Automatic camera selection for videoconference |
Also Published As
Publication number | Publication date |
---|---|
WO2015070558A1 (en) | 2015-05-21 |
CN103595953A (en) | 2014-02-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103595953B (en) | A kind of method and apparatus for controlling video capture | |
EP3855731A1 (en) | Context based target framing in a teleconferencing environment | |
CN107911644B (en) | Method and device for carrying out video call based on virtual face expression | |
DE102015100911B4 (en) | Improved communication between remote participants using advanced and virtual reality | |
JP6759406B2 (en) | Camera shooting control methods, devices, intelligent devices and computer storage media | |
CN105657329B (en) | Video conferencing system, processing unit and video-meeting method | |
US7907165B2 (en) | Speaker predicting apparatus, speaker predicting method, and program product for predicting speaker | |
JP4669041B2 (en) | Wearable terminal | |
CN109754811B (en) | Sound source tracking method, device, equipment and storage medium based on biological characteristics | |
CN108076307B (en) | AR-based video conference system and AR-based video conference method | |
US20090315974A1 (en) | Video conferencing device for a communications device and method of manufacturing and using the same | |
WO2019033968A1 (en) | Camera tracking method and apparatus, and device | |
WO2018214746A1 (en) | Video conference realization method, device and system, and computer storage medium | |
JP2004118314A (en) | Utterer detection system and video conference system using same | |
CN113138669A (en) | Image acquisition method, device and system of electronic equipment and electronic equipment | |
KR20230039555A (en) | Portrait positioning type remote meeting method | |
US7986336B2 (en) | Image capture apparatus with indicator | |
WO2015089944A1 (en) | Method and device for processing picture of video conference, and conference terminal | |
CN114466283A (en) | Audio acquisition method and device, electronic equipment and peripheral component method | |
EP1976291B1 (en) | Method and video communication system for gesture-based real-time control of an avatar | |
CN108320331B (en) | Method and equipment for generating augmented reality video information of user scene | |
CN104780341B (en) | A kind of information processing method and information processing unit | |
CN112261347A (en) | Method and device for adjusting participation right, storage medium and electronic device | |
JP2004248125A (en) | Device and method for switching video, program for the method, and recording medium with the program recorded thereon | |
CN113676693A (en) | Picture presentation method, video conference system and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |