CN110505399A

CN110505399A - Control method, device and the acquisition terminal of Image Acquisition

Info

Publication number: CN110505399A
Application number: CN201910746092.0A
Authority: CN
Inventors: 王光强; 林宏伟; 薛新丽; 王之奎; 贾其燕
Original assignee: Poly Polytron Technologies Inc
Current assignee: Poly Polytron Technologies Inc; Juhaokan Technology Co Ltd
Priority date: 2019-08-13
Filing date: 2019-08-13
Publication date: 2019-11-26
Also published as: WO2021027424A1

Abstract

The disclosure discloses a kind of control method of Image Acquisition, is applied to acquisition terminal, comprising: carries out Application on Voiceprint Recognition to the audio of acquisition, determines whether spokesman changes by the Application on Voiceprint Recognition；If spokesman change, according to audio collected position the audio corresponding to the position of spokesman in space；According to the position navigated to, camera in the acquisition terminal is adjusted, after adjustment, spokesman corresponding to the audio is located at the shooting picture center of the camera, and the adjustment includes the shooting angle for adjusting the camera and/or the focal length of the adjustment camera；Camera by adjusting after carries out the image that Image Acquisition obtains spokesman corresponding to the audio.It realizes and spokesman's tracking and positioning is carried out according to audio, and adjust camera to acquire the image of spokesman, solve the problems, such as the image that cannot collect spokesman caused by being located at shooting blind area because of spokesman in the prior art.

Description

Control method, device and the acquisition terminal of Image Acquisition

Technical field

This disclosure relates to multimedia technology field, in particular to a kind of control method of Image Acquisition, device and acquisition are eventually End.

Background technique

In the prior art, with the development of Internet technology and the communication technology, the application of multipart video-meeting at work It is more and more extensive.

In multipart video-meeting, display equipment real-time perfoming image is shown, shows the multi-party state of meeting.Wherein, it shows Show that image shown by equipment is camera acquired image.

For camera, camera acquired image is limited by camera deployed position and camera is non-adjustable Section, thus, the personnel participating in the meeting positioned at camera shooting blind area does not appear in camera acquired image.In turn, if Spokesman is located at the shooting blind area of camera, due to that cannot collect the image in shooting blind area, to show shown by equipment Picture in do not include spokesman portrait, cause other personnels participating in the meeting to cannot see that the image of spokesman.

From the foregoing, it will be observed that how to carry out Image Acquisition to guarantee that the problem of collecting the image of spokesman is urgently to be resolved.

Summary of the invention

In order to solve that spokesman's figure cannot be collected caused by being located at shooting blind area because of spokesman present in the relevant technologies The problem of picture, present disclose provides a kind of control method and device of Image Acquisition.

In a first aspect, a kind of control method of Image Acquisition, is applied to acquisition terminal, which comprises

Application on Voiceprint Recognition is carried out to the audio of acquisition, determines whether spokesman changes by the Application on Voiceprint Recognition；

If spokesman change, according to audio collected position the audio corresponding to the position of spokesman in space It sets；

According to the position navigated to, the camera in the acquisition terminal is adjusted, after adjustment, the audio institute Corresponding spokesman is located at the shooting picture center of the camera, and the adjustment includes adjusting the shooting angle of the camera And/or the focal length of the adjustment camera；

Camera by adjusting after carries out the image that Image Acquisition obtains spokesman corresponding to the audio.

Second aspect, a kind of control device of Image Acquisition are applied to acquisition terminal, and described device includes:

Voiceprint identification module determines spokesman by the Application on Voiceprint Recognition for carrying out Application on Voiceprint Recognition to the audio of acquisition Whether change；

Locating module, if judging that spokesman changes for voiceprint identification module, according to audio collected positioning The position of spokesman corresponding to audio in space；

Control module is adjusted for being adjusted to the camera in the acquisition terminal according to the position navigated to Afterwards, spokesman corresponding to the audio is located at the shooting picture center of the camera, and the adjustment includes adjusting the camera shooting The shooting angle of head and/or the focal length of the adjustment camera；

Image capture module carries out Image Acquisition for the camera by adjusting after and obtains speech corresponding to the audio The image of people.

The third aspect, a kind of acquisition terminal, comprising:

Processor；And

Memory is stored with computer-readable instruction on the memory, and the computer-readable instruction is by the processing Device realizes method as described above when executing.

The technical solution that the embodiment of the present disclosure provides can include the following benefits:

When judging spokesman's variation by Application on Voiceprint Recognition, the position of spokesman is determined according to acquired image, is gone forward side by side And camera is adjusted according to the position of spokesman, it is the center that spokesman is located at camera shooting picture, has thereby may be ensured that Effect collects the image of spokesman.It realizes and spokesman's tracking and positioning is carried out according to audio, and adjust camera to acquire speech The image of people efficiently solves the image that spokesman cannot be collected caused by being located at shooting blind area because of spokesman in the prior art The problem of.

It should be understood that the above general description and the following detailed description are merely exemplary, this can not be limited It is open.

Detailed description of the invention

The drawings herein are incorporated into the specification and forms part of this specification, and shows and meets implementation of the invention Example, and in specification together principle for explaining the present invention.

Fig. 1 is a kind of block diagram of terminal shown according to an exemplary embodiment；

Fig. 2 is a kind of flow chart of the control method of Image Acquisition shown according to an exemplary embodiment；

Fig. 3 is the flow chart of step 310 in one embodiment in Fig. 2 corresponding embodiment；

Fig. 4 is the flow chart of step 330 in one embodiment in Fig. 2 corresponding embodiment；

Fig. 5 is the flow chart of step 350 in one embodiment in Fig. 2 corresponding embodiment；

Fig. 6 is the flow chart of step 370 in one embodiment in Fig. 2 corresponding embodiment；

Fig. 7 is the flow chart of step 371 in one embodiment in Fig. 6 corresponding embodiment；

Fig. 8 is the flow chart of the control method of the Image Acquisition exemplified according to a specific implementation；

Fig. 9 is a kind of block diagram of the control device of Image Acquisition shown according to an exemplary embodiment.

Through the above attached drawings, it has been shown that the specific embodiment of the present invention will be hereinafter described in more detail, these attached drawings It is not intended to limit the scope of the inventive concept in any manner with verbal description, but is by referring to specific embodiments Those skilled in the art illustrate idea of the invention.

Specific embodiment

Here will the description is performed on the exemplary embodiment in detail, the example is illustrated in the accompanying drawings.Following description is related to When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment Described in embodiment do not represent all embodiments consistented with the present invention.On the contrary, they be only with it is such as appended The example of device and method being described in detail in claims, some aspects of the invention are consistent.

Fig. 1 is a kind of block diagram of terminal 200 shown according to an exemplary embodiment.Terminal 200 can be used as fixed whole End for carrying out Image Acquisition, the TV of terminal 200 such as integrated camera and sound acquisition module according to the disclosed method Machine, desktop computer etc..

Referring to Fig. 2, terminal 200 may include following one or more components: processing component 202, memory 204, power supply Component 206, multimedia component 208, sound collection component 210, camera 214 and communication component 216.

The integrated operation of the usual controlling terminal 200 of processing component 202, such as with display, Image Acquisition, data communication taken the photograph As head rotates and record associated operation of operation etc..Processing component 202 may include one or more processors 218 to hold Row instruction, to complete all or part of the steps of following methods.In addition, processing component 202 may include one or more moulds Block, convenient for the interaction between processing component 202 and other assemblies.For example, processing component 202 may include multi-media module, with Facilitate the interaction between multimedia component 208 and processing component 202.

Memory 204 is configured as storing various types of data to support the operation in terminal 200.These data are shown Example includes the instruction of any application or method for operating in terminal 200.Memory 204 can be by any kind of Volatibility or non-volatile memory device or their combination are realized, such as static random access memory (Static Random Access Memory, abbreviation SRAM), electrically erasable programmable read-only memory (Electrically Erasable Programmable Read-Only Memory, abbreviation EEPROM), Erasable Programmable Read Only Memory EPROM (Erasable Programmable Read Only Memory, abbreviation EPROM), programmable read only memory (Programmable Red- Only Memory, abbreviation PROM), read-only memory (Read-Only Memory, abbreviation ROM), magnetic memory, flash Device, disk or CD.One or more modules are also stored in memory 204, which is configured to by this One or more processors 218 execute, to complete all or part of step in following either method embodiments.

Power supply module 206 provides electric power for the various assemblies of terminal 200.Power supply module 206 may include power management system System, one or more power supplys and other with for terminal 200 generate, manage, and distribute the associated component of electric power.

Multimedia component 208 includes the screen of one output interface of offer between the terminal 200 and user.One In a little embodiments, screen may include liquid crystal display (Liquid Crystal Display, abbreviation LCD) and touch panel. If screen includes touch panel, screen may be implemented as touch screen, to receive input signal from the user.Touch panel Including one or more touch sensors to sense the gesture on touch, slide, and touch panel.The touch sensor can be with The boundary of a touch or slide action is not only sensed, but also detects duration associated with the touch or slide operation and pressure Power.Screen can also include display of organic electroluminescence (Organic Light Emitting Display, abbreviation OLED). Wherein, it can be shown by screen by camera acquired image.

Sound collection component 210 is configured for audio collection, and wherein sound collection component 210 may include several Sound acquisition module, sound acquisition module such as microphone (Microphone, abbreviation MIC), by sound collection component 210 into Row audio collection.

Camera 214 is for carrying out Image Acquisition, to obtain image.In the scheme of the disclosure, in terminal 200 at least Including one can controlled rotation camera.To be imaged according to the position control of spokesman after determining spokesman's variation Head rotation, to acquire the image of spokesman.

Communication component 216 is configured to facilitate the communication of wired or wireless way between terminal 200 and other equipment.Terminal 200 can access the wireless network based on communication standard, such as WiFi (WIreless-Fidelity, Wireless Fidelity).Show at one In example property embodiment, communication component 216 receives broadcast singal or broadcast from external broadcasting management system via broadcast channel Relevant information.In one exemplary embodiment, the communication component 216 further includes near-field communication (Near Field Communication, abbreviation NFC) module, to promote short range communication.For example, radio frequency identification (Radio can be based in NFC module Frequency Identification, abbreviation RFID) technology, Infrared Data Association (Infrared Data Association, abbreviation IrDA) technology, ultra wide band (Ultra Wideband, abbreviation UWB) technology, Bluetooth technology and other skills Art is realized.

In the exemplary embodiment, terminal 200 can be by one or more application specific integrated circuit (Application Specific Integrated Circuit, abbreviation ASIC), it is digital signal processor, digital signal processing appts, programmable Logical device, field programmable gate array, controller, microcontroller, microprocessor or other electronic components are realized, for executing Following methods.

Fig. 2 is a kind of flow chart of the control method of Image Acquisition shown according to an exemplary embodiment.The image is adopted The control method of collection is applied to acquisition terminal, acquisition terminal terminal 200 for example shown in FIG. 1.As shown in Fig. 2, this method, it can With the following steps are included:

Step 310, Application on Voiceprint Recognition is carried out to the audio of acquisition, determines whether spokesman changes by Application on Voiceprint Recognition.

Acquisition terminal includes sound acquisition module, carries out audio collection, the sound acquisition module by sound acquisition module Such as microphone.In a particular embodiment, sound acquisition module can integrate inside acquisition terminal, can also dispose and acquire Exterior of terminal, such as be connected by external interface with acquisition terminal.

The sound acquisition module of acquisition terminal persistently carries out signal acquisition, it is to be understood that since personnel are not to connect It is continuous constantly to talk, thus, sound acquisition module signal collected includes sound signal and tone-off signal.Disclosure meaning Audio has sound signal from sound acquisition module is collected, such as has a segment signal in sound signal or two adjacent Whole section between tone-off signal has sound signal.

In a particular embodiment, determined by end-point detection sound acquisition module it is signal collected in have sound signal and Tone-off signal.

Signal collected is divided before step 310 for the image for acquiring spokesman according to the disclosed method Section carries out Image Acquisition control according to the method being disclosed to the audio that segmentation obtains.The segmentation carried out, such as according to endpoint Detection determines on the basis of having sound signal and tone-off signal, has sound signal as a Duan Yin between two adjacent silences number Frequently.

In another embodiment, signal collected can also be segmented according to set collection period, from And will be segmented obtained has message number section as a segment of audio.

It in one embodiment, is to reduce operand, it is only adjacent to tone-off signal next to have the progress of message number section In other words Application on Voiceprint Recognition if the adjacent upper signal segment of audio is still to have sound signal, does not execute step 310, to default Spokesman corresponding to the audio still has spokesman corresponding to message number section by adjacent upper one.

Due to everyone sound organ, such as vocal cords, oral cavity, nasal cavity etc., in different poses and with different expressions, Yi Jifa is presented in pronunciation Likeness of the deceased amount, pronouncing frequency the sound for being not quite similar, thus everyone sound organ being caused to issue necessarily have the characteristics that it is respective, Form everyone unique vocal print.

The vocal print of people is characterized by vocal print feature.Vocal print feature is to carry out feature extraction according to audio collected to obtain .Vocal print feature such as mel-frequency cepstrum coefficient (Mel Frequency Cepstral Coefficents, MFCC), in short-term Energy, short-time average magnitude, short-time average zero-crossing rate, formant, linear prediction residue error (LPCC).

In a particular embodiment, it can be for progress Application on Voiceprint Recognition extracted vocal print feature from audio a kind of or more Kind, herein without specifically limiting.

The Application on Voiceprint Recognition carried out identifies the vocal print feature of current audio collected and the sound of upper one acquired audio Whether line feature is consistent, if it is inconsistent, showing spokesman and upper one acquired audio corresponding to current audio collected Corresponding spokesman is inconsistent, i.e., spokesman changes；, whereas if it is consistent, then show that current audio institute collected is right Ying spokesman is consistent with spokesman corresponding to upper one acquired audio, i.e., spokesman does not change.

Step 330, if spokesman changes, in space according to spokesman corresponding to audio collected positioning audio Position.

The positioning carried out determines that audio institute is right using auditory localization technology that is, according to the time for collecting the audio The position of Ying spokesman in space.

It is understood that the position of spokesman in space is actually one since spokesman has certain volume Area of space.For the ease of being calculated, by a certain region in area of space occupied by spokesman (such as occupied by head Region) or certain point be used to indicate the position of spokesman in space.

Wherein, auditory localization technology is that the time delay of audio is collected using multiple sound acquisition modules to determine that audio institute is right The position of Ying spokesman.

By now it should be appreciated that acquisition terminal includes at least two sound acquisition modules.It is stored in acquisition terminal Each sound acquisition module collects the time of the audio, it is thus possible to collect the time of audio according to each sound acquisition module It is corresponding to calculate the time delay that the audio is collected to any two sound acquisition modules, and then realize the positioning of spokesman position.

Step 350, according to the position navigated to, the camera in acquisition terminal is adjusted, after adjustment, audio institute Corresponding spokesman is located at the shooting picture center of camera, and adjustment includes the shooting angle and/or adjustment camera shooting of adjustment camera The focal length of head.

According to the position navigated to, that is, it can determine azimuth-range of the spokesman corresponding to audio relative to camera.

For Image Acquisition, especially with the Image Acquisition for artificial target of making a speech for, to collect spokesman's Adjustment clear and convenient for progress camera for the purpose of the image of identification.

Adjustment to be carried out can be the shooting angle of adjustment camera, so that adjustment rear camera is directed at audio institute Corresponding spokesman；It is also possible to adjust the focal length of camera, thus guarantee the portrait of the spokesman ratio in acquired image, Guarantee that viewing personnel can pass through image accurate recognition spokesman；It can also be while adjusting the shooting angle and coke of camera Away from, it is determining with specific reference to actual conditions, i.e., judge whether to need to carry out shooting angle and coke according to identified distance and bearing Away from adjustment.

When the spokesman according to corresponding to audio judges spokesman not in camera current shooting relative to the orientation of camera In picture under angle or spokesman's deviation camera current shooting angle is larger, then is taken the photograph according to the control of identified orientation As head rotation, that is, the shooting angle of camera shooting is adjusted, so that camera is directed at spokesman after guaranteeing adjustment.Conversely, if according to really Fixed orientation judges the center for the shooting picture that spokesman is located under camera current shooting angle, then without shooting angle tune It is whole.

When the spokesman according to corresponding to audio relative to camera Distance Judgment spokesman apart from camera farther out when, from And make under current focus portrait in acquired image occupied ratio is smaller in the picture, then adjust the coke of camera Away to guarantee that the ratio of the portrait of spokesman in acquired image in the picture meets the requirement of setting；, whereas if judging Occupied ratio is met the requirements portrait in the picture in acquired image under current focus, then without Focussing.

Step 370, the camera by adjusting after carries out the image that Image Acquisition obtains spokesman corresponding to audio.

As above, after adjusting camera, spokesman corresponding to audio is located at the center of camera shooting picture, thus Corresponding acquisition obtains the image of spokesman corresponding to audio.

Wherein, the image of spokesman can be the whole body images of spokesman, upper part of the body image etc., herein without specifically limiting It is fixed.

In one embodiment, the image of acquired spokesman is the image based on spokesman corresponding to audio.

Wherein, the image of the acquired spokesman of the disclosure in acquisition terminal for showing, thus in speech human hair While speech, the image of spokesman is shown.Wherein acquisition terminal can be shown by the display screen of itself, can also be led to It crosses external display equipment to be shown, herein without specifically limiting.

In one embodiment, after step 370, this method further include:

Image shown by acquisition terminal is replaced with to the image of spokesman.

In the technical solution of the disclosure, when judging spokesman's variation according to audio, spokesman's positioning is carried out according to audio, And camera is adjusted according to the position of navigated to spokesman, to collect the image of spokesman.Realize according to audio into Row spokesman's tracking and positioning, and the image of the station acquisition spokesman according to spokesman.To guarantee shown by the acquisition terminal Picture by acquisition spokesman image, can effectively solve the people that spokesman is not present in shown picture in the prior art The problem of picture.

In one embodiment, before being shown, according to the scale of the display screen of acquisition terminal to spokesman Image amplify, to guarantee that the image adaptation of spokesman obtained in display screen, guarantees display effect.

In one embodiment, after step 310, however, it is determined that spokesman does not change, then maintains the shooting angle of camera It is constant, so as to continue to acquire image and the display of the spokesman.

In another embodiment, after step 310, however, it is determined that when spokesman does not change, do not replace acquisition terminal and show The image shown, in other words, if acquiring the speech of an audio and this acquired audio artificially same people, shown by maintenance Image it is constant.

In another embodiment, after step 310, however, it is determined that spokesman does not change, then judges audio according to the audio Whether the position of corresponding spokesman changes, if spokesman position does not change, is adjusted according to the position of spokesman Camera, wherein to camera carried out adjustment include adjustment camera shooting angle, and/or, according to spokesman with take the photograph As the focal length of the distance between head adjustment camera.To, guarantee that spokesman is located at the center of the shooting picture of camera, thus The image of clearly spokesman is collected, passes through the image identification spokesman of acquired spokesman convenient for viewing personnel.

Disclosed method can be applied in multipart video-meeting, to be collected according in multipart video-meeting The corresponding image for acquiring spokesman according to the disclosed method of audio, to show the image of spokesman in screen, and should The image synchronization of spokesman is shown in the display screen of other conferenced parties, so that the personnel participating in the meeting in multipart video-meeting Spokesman can be determined according to shown image.

In one embodiment, as shown in figure 3, step 310, comprising:

Step 311, vocal print feature is extracted from audio.

As described above, extracted vocal print feature can be mel-frequency cepstrum coefficient, short-time energy, short-time average width One or more of degree, short-time average zero-crossing rate, formant, linear prediction residue error, extracted vocal print feature can To guarantee the accuracy of Application on Voiceprint Recognition, extracted vocal print feature is not limited specifically herein.

Step 313, vocal print phase of the extracted vocal print feature relative to vocal print feature corresponding to upper one acquired audio is calculated Like degree.

Vocal print similarity is used to characterize the vocal print feature of current acquired audio relative to corresponding to upper one acquired audio The similitude of vocal print feature.

In a particular embodiment, be carry out vocal print similarity calculating, according to by acquisition audio extraction vocal print feature The vocal print vector of the audio is constructed, to carry out by the vocal print vector of present video and the vocal print vector of upper one acquired audio Vocal print similarity calculation, such as it regard Euclidean distance, COS distance, mahalanobis distance of two vocal print vectors etc. as vocal print similarity.

Step 315, determine whether spokesman changes according to vocal print similarity.

When two vocal print feature of vocal print similarity characterization being calculated is similar, it is determined that spokesman does not change；Conversely, If when the two vocal print feature dissmilarity of vocal print similarity characterization being calculated, it is determined that spokesman's variation.

In a particular embodiment, to determine whether spokesman changes according to vocal print similarity, similarity can be preset Range, if vocal print similarity is located in the similarity dimensions, then it represents that two vocal print features corresponding to the vocal print similarity are similar.

To can determine by determining whether vocal print similarity be calculated is located at set similarity dimensions Whether spokesman changes, and even vocal print similarity is located in similarity dimensions, it is determined that spokesman does not change；Conversely, if vocal print Similarity exceeds similarity dimensions, it is determined that spokesman's variation.

In one embodiment, acquisition terminal includes a reference voice acquisition module and at least three non-reference sound collections Module, as shown in figure 4, step 330, comprising:

Step 331, according to reference voice acquisition module and non-reference sound acquisition module collect respectively audio when Between, the time delay that each non-reference sound acquisition module collects audio relative to reference voice acquisition module is calculated.

In the present embodiment, each sound acquisition module is while acquiring audio, it is corresponding store collect audio when Between, thus, collect the time pair of the audio respectively according to reference voice acquisition module and each non-reference sound acquisition module The time delay that each non-reference sound acquisition module collects the audio relative to reference voice acquisition module should be calculated.

Step 333, it is counted according to reference voice acquisition module, the position of non-reference sound acquisition module and time delay It calculates, obtains the position coordinates of spokesman corresponding to audio.

Wherein, the position of reference voice acquisition module is as reference origin, and constructs coordinate system, thus according to reference voice Acquisition module, each non-reference sound acquisition module position can be obtained each non-reference sound acquisition module relative in institute Construct the coordinate in coordinate system.

And the time delay of the audio is collected relative to reference voice acquisition module according to each non-reference sound acquisition module Spokesman corresponding to audio and non-reference sound acquisition module and the range difference with reference voice acquisition module can be calculated.

Following matrix equation is constructed by the coordinate and institute's calculated distance difference of each non-reference sound acquisition module:

AX=B

Wherein, matrix A is the matrix of n × 4, and n is the quantity of non-reference sound acquisition module, the i-th row element in matrix A For [x_i,y_i,z_i,d_i], x_iFor the x-axis coordinate of i-th of non-reference sound acquisition module, y_iFor i-th of non-reference sound collection mould The y-axis coordinate of block, z_iFor the z-axis coordinate of i-th of non-reference sound acquisition module, d_iFor spokesman corresponding to audio and i-th it is non- Reference voice acquisition module and range difference with reference voice acquisition module；X=[x, y, z, R]^T；Matrix B is the matrix of n × 4, The i-th row element in matrix B is

Above-mentioned matrix equation is solved, the position coordinates (x, y, z) of spokesman corresponding to audio can be calculated.

In one embodiment, as shown in figure 5, step 350, comprising:

Step 351, according to the position navigated to, distance and side of the spokesman corresponding to audio relative to camera are determined Position.

Step 353, the focal length of camera is adjusted according to identified distance, and is imaged according to identified orientation adjustment The shooting angle of head.

Wherein, the adjustment of carried out shooting angle controls camera rotation according to identified orientation, to make to rotate Spokesman corresponding to camera shooting alignment audio afterwards.

To carry out Focussing, can be carried out according to configuration file.It adjusts the distance in configuration file and is reflected with focal length It penetrates, thus, after determining spokesman corresponding to audio at a distance from camera, this is obtained from configuration file apart from mapped Focal length, thus, it is acquired focal length by the Focussing of camera.

In one embodiment, as shown in fig. 6, step 370, comprising:

Step 371, according to camera acquired image adjusted, spokesman's identification is carried out, in the picture positioning hair Say the portrait of people.

In an application scenarios, if distance of the camera apart from spokesman is farther out, and in the space where acquisition terminal The personnel of receiving are more, even if spokesman corresponding to audio is located at the center of camera shooting picture, and camera shooting after rotation Under the shooting angle of head, it may include multiple personnel in institute's acquired image.

Under this application scenarios, in order to accurately obtain the image of spokesman corresponding to audio, spokesman's identification is carried out, really Position of the portrait of spokesman corresponding to accordatura frequency in acquired image.

For personnel, lip correspondence is acted while speech.Spokesman's identification to be carried out can lead to The lip motion for crossing everyone in acquired image identifies.Such as the lip picture of personnel is extracted from the image of continuous acquisition Element judges whether the lip of personnel acts by comparing the extracted lip pixel from consecutive image, if movement, it is determined that Portrait where the lip pixel is the portrait of spokesman；Conversely, if lip does not move, it is determined that portrait where lip pixel is not The portrait of spokesman.

In other embodiments, to carry out spokesman's identification, movement agreement can be carried out in advance, such as agreement spokesman exists It picked me when speech, arrange spokesman's standing speech, thus, it is moved in acquired image by what identification was arranged Make, such as movement of raising one's hand, standing, and the portrait that the action state is presented in image is determined as to the portrait of spokesman.

Step 373, image is cut out according to the portrait navigated to, obtains the image of spokesman.

So far, then from include multiple portraits image in cut out the image obtained based on spokesman, i.e. spokesman Image.Wherein spokesman's image obtained includes at least the face-image of spokesman.

In the more conference scenario of some personnels participating in the meeting, due to it is shown in display equipment be panorama, thus Portrait in shown picture is more, causes its other party attended a meeting that can not rapidly navigate to from shown picture currently The portrait of spokesman.

In the scheme of the present embodiment, by carrying out the positioning of spokesman's portrait, and it is cut out, to guarantee to be made a speech The image of people is based on spokesman, and the personnel of raising identify the speed of spokesman from the image of spokesman.

In one embodiment, as shown in fig. 7, step 371, comprising:

Step 410, according to camera acquired image adjusted, by each portrait in acquisition image to specified Organ carries out pixel extraction.

As described above, the spokesman's identification carried out can be lip motion or agreement based on everyone in image Movement identify, but regardless of being lip or the movement arranged is realized by organ, such as lip, hand etc..

The execution organ of movement for spokesman's identification is designated organ, for example, if by lip motion come Spokesman's identification is carried out, then lip is designated organ, if gesture carries out spokesman's identification, hand is designated organ.

To carry out spokesman's identification in acquired image, first carry out designated organ positioning in the picture, fixed correspondence mentions Take the pixel of designated organ.

Step 430, action recognition is carried out according to extracted pixel, determines the movement that extracted pixel is characterized.

By extracted pixel, that is, restructural designated organ shape, so that corresponding determine according to the shape reconstructed The movement that pixel is characterized.

Step 450, by the pixel place portrait that is consistent with predetermined action of characterization movement be determined as the portrait of spokesman.

Predetermined action for example arranges the movement for carrying out spokesman's identification, for example, raise one's hand, stand, lip is dynamic etc., In This is without specifically limiting.

To, if the movement that institute's pixel is characterized is consistent with predetermined action, it is determined that the pixel place portrait be The portrait of spokesman.

In one embodiment, this method further include:

Whether detection does not collect audio after being spaced set period of time yet.

If it has, then control camera is rotated to default shooting angle.

If it has not, then executing the step of carrying out Application on Voiceprint Recognition to the audio of acquisition.

After being spaced set period of time, if not collecting audio yet, control rotates camera to default shooting angle Degree.Further, institute's acquired image under the shooting angle is shown in acquisition terminal.

Conversely, if collecting audio, going to after being spaced set period of time and executing step 310.

Fig. 8 is the flow chart of the Image Acquisition control method exemplified according to a specific implementation, in the present embodiment, acquisition Terminal is the television set for including camera and sound acquisition module, as shown in figure 8, including the following steps:

Step 510, spokesman identifies: the portrait of spokesman, the hair carried out are identified according to camera acquired image Speech people identifies the movement that can be moved or arrange by lip to identify.

Step 520, spokesman's image cutting-out: after the portrait for recognizing spokesman in the picture, to acquired image into Row is cut, and the image of spokesman is obtained, to show the image of spokesman obtained on a television set.

Step 530, if continue to collect audio: the detection of real-time perfoming audio collection state (such as per second is examined Survey), if continuing to collect audio, go to step 540；If audio is not collected, then step 560 is gone to.

Step 540, whether spokesman changes: an Application on Voiceprint Recognition is carried out by the collected audio of institute, to determine that spokesman is No variation；If spokesman changes, step 550 is gone to；If spokesman does not change, do not deal with, i.e. continuation display TV The currently displayed image of machine.

Step 550, camera is adjusted according to the position of spokesman: determines spokesman's according to the time of collected audio Position, to accordingly adjust camera according to the position of spokesman.The adjustment carried out is for example according to spokesman relative to taking the photograph As the shooting angle of the angle adjustment camera of head, in another example adjusting camera relative to the distance of camera according to spokesman Focal length or shooting angle and focal length adjust.Then the camera by adjusting after carries out Image Acquisition, and goes to step 510。

Step 560, if be more than setting time: start timing when detection does not continue to collect audio, if being more than Setting time (such as 30s) does not still collect audio, then goes to step 570；If the time for not collecting audio is less than Setting time then continues timing.

Step 570, control camera is rotated to default shooting angle: Image Acquisition is carried out under default shooting angle, and Acquired image is shown on a television set.While showing image, spokesman's identification is carried out according to acquired image, i.e., Go to step 510.

Following is embodiment of the present disclosure, can be used for executing the Image Acquisition that the above-mentioned terminal 200 of the disclosure executes Control method embodiment.For those undisclosed details in the apparatus embodiments, the control of disclosure Image Acquisition is please referred to Embodiment of the method.

Fig. 9 is a kind of block diagram of the control device of Image Acquisition shown according to an exemplary embodiment, which can be with For executing all or part of step in either method embodiment in terminal 200 shown in FIG. 1.As shown in figure 9, the dress It sets including but not limited to: voiceprint identification module 610, locating module 630, adjustment module 650 and image capture module 670, In:

Voiceprint identification module 610 determines that spokesman is by Application on Voiceprint Recognition for carrying out Application on Voiceprint Recognition to the audio of acquisition No variation.

Locating module 630 positions sound according to audio collected if judging that spokesman changes for voiceprint identification module The position of spokesman corresponding to frequency in space.

Module 650 is adjusted, for being adjusted to the camera in acquisition terminal according to the position navigated to, is adjusted Afterwards, spokesman corresponding to audio be located at camera shooting picture center, adjustment include adjustment camera shooting angle and/or Adjust the focal length of camera.

Image capture module 670 carries out Image Acquisition for the camera by adjusting after and obtains speech corresponding to audio The image of people.

The function of modules and the realization process of effect are specifically detailed in the controlling party of above-mentioned Image Acquisition in above-mentioned apparatus The realization process of step is corresponded in method, details are not described herein.

It is appreciated that these modules can by hardware, software, or a combination of both realize.When realizing in hardware When, these modules may be embodied as one or more hardware modules, such as one or more specific integrated circuits.When with software side When formula is realized, these modules may be embodied as the one or more computer programs executed on the one or more processors, example The program of storage in memory 204 as performed by the processor 218 of Fig. 1.

In one embodiment, voiceprint identification module 610, comprising:

Feature extraction unit, for extracting vocal print feature from audio.

Computing unit, for calculating sound of the extracted vocal print feature relative to vocal print feature corresponding to upper one acquired audio Line similarity.

Determination unit, for determining whether spokesman changes according to vocal print similarity.

In one embodiment, acquisition terminal includes a reference voice acquisition module and at least three non-reference sound collections Module, locating module 630, comprising:

Time-delay calculation unit, for being collected respectively according to reference voice acquisition module and non-reference sound acquisition module The time of audio, be calculated each non-reference sound acquisition module relative to reference voice acquisition module collect audio when Prolong.

Coordinate calculating unit, for according to the position of reference voice acquisition module, non-reference sound acquisition module and Time delay is calculated, and the position coordinates of spokesman corresponding to audio are obtained.

In one embodiment, module 650 is adjusted, comprising:

Angle and orientation determination element, for according to the position navigated to, determine spokesman corresponding to audio relative to The distance and bearing of camera.

Adjustment unit, for the focal length according to identified distance adjustment camera, and according to identified orientation tune The shooting angle of whole camera.

In one embodiment, image capture module 670, comprising:

Portrait positioning unit, for spokesman's identification being carried out, in image according to camera acquired image adjusted The portrait of middle positioning spokesman.

Unit is cut out, for being cut out according to the portrait navigated to image, obtains the image of spokesman.

In one embodiment, portrait positioning unit, comprising:

Pixel extraction unit, for according to camera acquired image adjusted, by it is every in acquisition image One portrait carries out pixel extraction to designated organ.

Action recognition unit determines extracted pixel institute table for carrying out action recognition according to extracted pixel The movement of sign.

Portrait determination unit, for by characterization movement be consistent with predetermined action pixel place portrait be determined as making a speech The portrait of people.

In one embodiment, the device further include:

Replacement module is shown, for image shown by acquisition terminal to be replaced with to the image of spokesman.

In one embodiment, the device further include:

Whether detection module does not collect audio after being spaced set period of time for detecting yet.

Rotation adjustment module controls if not collecting audio after being spaced set period of time for detection module detection Camera is rotated to default shooting angle.

If detection module detection collects audio after being spaced set period of time, voiceprint identification module 610 is gone to.

Modules/unit function and the realization process of effect are specifically detailed in above-mentioned image method acquisition in above-mentioned apparatus The realization process of step is corresponded in control method, details are not described herein.

Optionally, the disclosure also provides a kind of acquisition terminal, which can be terminal 200 shown in FIG. 1, executes All or part of step in any of the above embodiment of the method.Acquisition terminal includes:

Processor；And memory, computer-readable instruction is stored on memory, and computer-readable instruction is held by processor The method in any of the above embodiment of the method is realized when row.

The processor of device in the embodiment executes the concrete mode of operation in the control in relation to the Image Acquisition Detailed description is performed in the embodiment of method, no detailed explanation will be given here.

In the exemplary embodiment, a kind of computer readable storage medium is additionally provided, is stored thereon with computer-readable Instruction when computer-readable instruction is executed by processor, realizes the method in any of the above embodiment of the method.

It should be understood that the present invention is not limited to the precise structure already described above and shown in the accompanying drawings, and And various modifications and change can executed without departing from the scope.The scope of the present invention is limited only by the attached claims.

Claims

1. a kind of control method of Image Acquisition is applied to acquisition terminal, which is characterized in that the described method includes:

If spokesman change, according to audio collected position the audio corresponding to the position of spokesman in space；

According to the position navigated to, the camera in the acquisition terminal is adjusted, after adjustment, corresponding to the audio Spokesman is located at the shooting picture center of the camera, the adjustment include adjust the camera shooting angle and/or Adjust the focal length of the camera；

2. the method according to claim 1, wherein described carry out Application on Voiceprint Recognition to the audio, by described Application on Voiceprint Recognition judges whether spokesman changes, comprising:

Vocal print feature is extracted from the audio；

Calculate vocal print similarity of the extracted vocal print feature relative to vocal print feature corresponding to upper one acquired audio；

Determine whether spokesman changes according to the vocal print similarity.

3. the method according to claim 1, wherein the acquisition terminal includes a reference voice acquisition module With at least three non-reference sound acquisition modules, the spokesman according to corresponding to the audio collected positioning audio is in sky Between in position, comprising:

According to the reference voice acquisition module and the non-reference sound acquisition module collect respectively the audio when Between, each non-reference sound acquisition module is calculated relative to the reference voice acquisition module and collects the audio Time delay；

It is counted according to the reference voice acquisition module, the position of the non-reference sound acquisition module and the time delay It calculates, obtains the position coordinates of spokesman corresponding to the audio.

4. the method according to claim 1, wherein described according to the position navigated to, eventually to the acquisition Camera in end is adjusted, comprising:

According to the position navigated to, distance and bearing of the spokesman corresponding to the audio relative to the camera is determined；

Adjust the focal length of the camera according to identified distance, and the camera according to identified orientation adjustment Shooting angle.

5. being obtained the method according to claim 1, wherein the camera by adjusting after carries out Image Acquisition Obtain the image of spokesman corresponding to the audio, comprising:

According to camera acquired image adjusted, spokesman's identification is carried out, the spokesman is positioned in described image Portrait；

Described image is cut out according to the portrait navigated to, obtains the image of the spokesman.

6. according to the method described in claim 5, it is characterized in that, described according to camera acquired image adjusted, Spokesman's identification is carried out, the portrait of the spokesman is positioned in described image, comprising:

According to camera acquired image adjusted, by each portrait in acquisition image pixel is carried out to designated organ Point extracts；

Action recognition is carried out according to extracted pixel, determines the movement that extracted pixel is characterized；

By the pixel place portrait that is consistent with predetermined action of characterization movement be determined as the portrait of spokesman.

7. being obtained the method according to claim 1, wherein the camera by adjusting after carries out Image Acquisition After the image for obtaining spokesman corresponding to the audio, the method also includes:

Image shown by the acquisition terminal is replaced with to the image of the spokesman.

8. the method according to claim 1, wherein the method also includes:

Whether detection does not collect audio after being spaced set period of time yet；

It rotates if it has, then controlling the camera to default shooting angle；

If it has not, the step of audio for then executing described pair of acquisition carries out Application on Voiceprint Recognition.

9. a kind of control device of Image Acquisition, it is applied to acquisition terminal, which is characterized in that described device includes:

Whether voiceprint identification module determines spokesman by the Application on Voiceprint Recognition for carrying out Application on Voiceprint Recognition to the audio of acquisition Variation；

Locating module positions the audio according to audio collected if judging that spokesman changes for voiceprint identification module The position of corresponding spokesman in space；

Control module, for being adjusted to the camera in the acquisition terminal according to the position navigated to, after adjustment, Spokesman corresponding to the audio is located at the shooting picture center of the camera, and the adjustment includes adjusting the camera The focal length of shooting angle and/or the adjustment camera；

Image capture module carries out Image Acquisition for the camera by adjusting after and obtains spokesman's corresponding to the audio Image.

10. a kind of acquisition terminal characterized by comprising

Processor；And

Memory is stored with computer-readable instruction on the memory, and the computer-readable instruction is held by the processor Such as method described in any item of the claim 1 to 8 is realized when row.