WO2015042897A1 - 一种控制方法、控制装置及控制设备 - Google Patents

一种控制方法、控制装置及控制设备 Download PDF

Info

Publication number
WO2015042897A1
WO2015042897A1 PCT/CN2013/084558 CN2013084558W WO2015042897A1 WO 2015042897 A1 WO2015042897 A1 WO 2015042897A1 CN 2013084558 W CN2013084558 W CN 2013084558W WO 2015042897 A1 WO2015042897 A1 WO 2015042897A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound source
information
target sound
control
direction information
Prior art date
Application number
PCT/CN2013/084558
Other languages
English (en)
French (fr)
Inventor
陈军
黄强
黄志宏
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Priority to US14/426,480 priority Critical patent/US9591229B2/en
Priority to EP13892080.6A priority patent/EP2882180A4/en
Priority to PCT/CN2013/084558 priority patent/WO2015042897A1/zh
Publication of WO2015042897A1 publication Critical patent/WO2015042897A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • H04N7/142Constructional details of the terminal equipment, e.g. arrangements of the camera and the display
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/69Control of means for changing angle of the field of view, e.g. optical zoom objectives or electronic zooming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/695Control of camera direction for changing a field of view, e.g. pan, tilt or based on tracking of objects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/20Arrangements for obtaining desired frequency or directional characteristics
    • H04R1/32Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
    • H04R1/326Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only for microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules

Definitions

  • Embodiments of the present invention relate to the field of image tracking, and in particular, to a control method, a control device, and a control device.
  • an object of embodiments of the present invention is to provide a control method, a control device, and a control device to support a shooting device capable of capturing a target sound source that is outside the range of the original screen.
  • An embodiment of the present invention provides a control method, where the control method includes:
  • the position range information is direction information of the target sound source relative to the photographing device, and the position range information controls the rotation of the photographing device that cannot currently capture the target sound source:
  • the rotation of the photographing device is controlled according to the rotation control parameter.
  • the audio data is collected by a sound collection device, and the location range information of the target sound source is determined according to the audio data:
  • the direction information is determined according to the orientation information.
  • the determining the orientation information by using the location information is:
  • the sound collection device is configured to determine a preset plane of the orientation information and a preset reference point on the preset plane, where the photographing device corresponds to a first plane on the preset plane.
  • the target sound source corresponds to a second corresponding point on the preset plane
  • the orientation information is a position coordinate of the second corresponding point relative to the preset reference point
  • the direction information is a direction information representation coordinate of the second corresponding point relative to the first corresponding point
  • Corresponding relationship is that the position coordinate of the utterance corresponding point on the preset plane relative to the preset reference point is an argument, and the position coordinate of the first corresponding point relative to the preset reference point is a parameter
  • a plane geometric function whose coordinates are variables is represented by direction information of the utterance corresponding point with respect to the first corresponding point.
  • a position coordinate of the first corresponding point relative to the preset reference point is a coordinate (al, a2) on a Cartesian coordinate of the preset plane with the preset reference point as a first origin.
  • a position coordinate of the utterance corresponding point relative to the preset reference point is a coordinate (X, y) on the rectangular coordinate, y is greater than a2, and the direction information represents that the coordinate belongs to the preset plane
  • the first corresponding point is an angular coordinate b on a polar coordinate of the second origin,
  • the parameter is a position coordinate of a corresponding training point on the preset plane relative to the preset reference point and relative to the first corresponding point according to at least one training sound source obtained by the learning training mode.
  • the direction information is characterized by the coordinates determined.
  • the embodiment of the invention provides a control device, and the control device includes:
  • the obtaining module is set to: obtain audio data containing sound information of the target sound source; a determining module, configured to: determine location range information of the target sound source according to the audio data;
  • the control module is configured to: control rotation of the photographing device that is currently unable to capture the target sound source according to the position range information, so that the photographing device can capture the target sound source.
  • the location range information is direction information of the target sound source relative to the photographing device
  • the control module includes:
  • a first determining unit configured to: determine a rotation control parameter of the photographing device corresponding to the direction information
  • the control unit is configured to: control rotation of the photographing device according to the rotation control parameter, so that the photographing device can capture the target sound source.
  • the audio data is collected by a sound collection device
  • the determining module includes: a second determining unit, configured to: determine, according to the audio data, the target sound source relative to the sound collection device Azimuth information;
  • the third determining unit is configured to: determine the direction information according to the orientation information.
  • the third determining unit includes:
  • the determining subunit is configured to: determine the direction information according to the orientation information, and a preset correspondence relationship between the orientation information and the direction information.
  • Embodiments of the present invention provide a control device including the above-described control device.
  • control method, the control device and the control device provided by the embodiments of the present invention at least include the following technical effects:
  • FIG. 1 is a flowchart of a control method according to an embodiment of the present invention
  • FIG. 2 is an array microphone and a preferred embodiment of a control method according to an embodiment of the present invention. Position map of the sound source;
  • FIG. 3 is a schematic diagram showing the position of the array microphone placed directly in front of the camera according to a second embodiment of the control method according to an embodiment of the present invention.
  • FIG. 4 is a position coordinate diagram of an array microphone and a sound source according to a second embodiment of a control method according to an embodiment of the present invention
  • FIG. 5 is a schematic diagram of a training method according to a preferred embodiment of the present invention.
  • FIG. 6 is a schematic diagram of a third preferred embodiment of a control method according to an embodiment of the present invention. Preferred embodiment of the invention
  • FIG. 1 is a flowchart of a control method according to an embodiment of the present invention. Referring to FIG. 1 , an embodiment of the present invention provides a control method, where the control method includes the following steps:
  • Step 101 Acquire audio data that includes sound information of a target sound source.
  • Step 102 Determine location range information of the target sound source according to the audio data.
  • Step 103 Control, according to the location range information, rotation of a photographing device that cannot currently capture the target sound source, so that the photographing device can The target sound source is captured.
  • the position range information of the target sound source is determined according to the position range information, and the rotation of the photographing device that cannot currently capture the target sound source is controlled according to the position range information, so that the photographing device can The target sound source is captured, thereby enabling the shooting device to capture a target sound source that is outside the original screen range.
  • the target sound source should be within the entire photographic range that the photographing device can achieve by rotation.
  • the target sound source may be a speaker or a sounding device.
  • the photographing device can be a camera or a camera.
  • the voice information may include a preset keyword content indicating the location range information, and the location range information may be determined according to the audio data by using a voice recognition technology.
  • the location range information may be the target sound source relative to the shooting The direction information of the device, and the controlling the rotation of the camera device that cannot currently capture the target sound source according to the location range information may be:
  • the rotation of the photographing device is controlled according to the rotation control parameter.
  • the rotation control parameter is, for example, an identifier of the camera at a certain angle of the plurality of adjustable angles, a rotation angle of the camera's pan/tilt controller, a direction parameter of the optical axis of the camera, and the like.
  • the audio data may be collected by a sound collection device, and the determining the location range information of the target sound source according to the audio data may specifically be:
  • the direction information is determined according to the orientation information.
  • the sound collecting device is, for example, an array microphone.
  • the orientation information may be direction or location information.
  • the determining the direction information according to the orientation information may be: determining the direction information according to the orientation information, and a preset correspondence relationship between the orientation information and the direction information.
  • a sufficient number of combinations of orientation information and direction information are obtained by training of a plurality of points, and the correspondence is obtained by fitting these combinations.
  • a training sound source is placed or moved with a distribution granularity of 0.1 m.
  • the photographing device and the sound collecting device are placed in a specific positional relationship such that the orientation information is consistent with the direction indicated by the direction information when the target sound source is in any position;
  • the positional relationship determines the correspondence.
  • the photographing device and the sound collecting device may be placed together, or when the sound collecting device is horizontally placed, the photographing device is placed directly above the sound collecting device. In the preferred embodiment 1 below, the manner in which they are placed together is used.
  • the photographing device placement position may allow a certain deviation, because the photographing device can capture a wide range at the same time. As long as the deviation enables the photographing device to capture the
  • the direction indicated by the orientation information can be, which can be implemented in the field of engineering practice, and will not be described here.
  • the sound collecting device is configured to determine a preset plane of the orientation information and a preset reference point on the preset plane, where the photographing device corresponds to a first plane on the preset plane.
  • the target sound source corresponds to a second corresponding point on the preset plane
  • the orientation information is a position coordinate of the second corresponding point relative to the preset reference point
  • the direction information is a direction information representation value of the second corresponding point relative to the first corresponding point
  • Corresponding relationship is that the position coordinate of the utterance corresponding point on the preset plane relative to the preset reference point is an argument, and the position coordinate of the first corresponding point relative to the preset reference point is a parameter
  • a plane geometric function whose value is a variable is represented by direction information of the utterance corresponding point with respect to the first corresponding point.
  • the first corresponding point for example, the optical center of the photographing device or the projection of the optical center of the photographing device on the preset plane.
  • the second corresponding point for example, the target sound source is at a certain point of the preset plane or the target sound source is not at a certain point of the preset plane at the preset plane.
  • the utterance corresponding point for example, the utterance reference point of the sound source in the preset plane or the projection of the utterance reference point of the sound source not in the preset plane on the preset plane.
  • the vocal reference point may be a point of a person's throat or a point of a sound output unit of the sound source.
  • the direction information represents a value, for example, when the second corresponding point is used as an origin, and when the original plane has an axis coordinate system centered on the origin, the angle coordinate value of the sounding corresponding point in the axis coordinate system .
  • the preset plane and the preset reference point corresponding to the sound collection device are related to which device the sound collection device specifically uses, such as a positioning plane and a positioning reference point used by the planar array microphone.
  • the location of the sound source may be in the preset plane, or may be on a side of the preset plane, and due to other factors, the obtained orientation information may have A small error, however, because the shooting device can capture a wide range at the same time, such an error does not affect the solution to the technical problem to be solved by the embodiment of the present invention.
  • the position coordinate of the reference point is a coordinate (al , a2 ) on the rectangular coordinate of the preset plane with the preset reference point as the first origin, and the sounding corresponding point is relative to the preset reference point
  • the positional coordinate is the coordinate (X, y) on the rectangular coordinate, and y is greater than a2, and the direction information represents that the coordinate belongs to the polar coordinate of the preset plane with the first corresponding point as the second origin.
  • Angle coordinate b
  • the parameter may be obtained by field measurement in the engineering implementation process; or the parameter may be a training point corresponding to the preset point on the preset plane according to at least one training sound source obtained through the learning training mode.
  • the position coordinates of the reference point and the direction information representation value relative to the first corresponding point are determined.
  • the learning training method is as follows:
  • the learning training method in the preferred embodiment 2 below is to use this learning training method.
  • the learning training method is, for example:
  • the second training point, the third training point, and the first corresponding point are not collinear.
  • the learning training method in the preferred embodiment 3 below is to use this learning training method.
  • three preferred embodiments of the control method are given below:
  • the array microphone has various physical forms, and the preferred embodiment is a linear Array microphone with at least 3 microphones on it. At the same time, the camera is placed with the array mic.
  • the steps of the preferred embodiment are as follows:
  • Step 201 Receive audio data by using multiple microphones of the array microphone, filter the background noise, send it to the processing center, or send it to the processing center to filter out noise.
  • Step 202 The processing center extracts and separates the vocal parts in the multiplexed audio data according to the frequency, and then calculates the vocal time difference received by the plurality of mics according to the phase difference of the vocal part in the multiplexed audio data.
  • Step 203 Calculate the distance difference according to the time difference received by the plurality of microphones multiplied by the speed of sound, and calculate the orientation of the sound according to the distance difference between the three microphones.
  • the direct distance of the array microphone head is a known distance, we set it to R, we mark the microphone head 2 as the coordinate origin, the microphone head 1 coordinate is (-R, 0), and the microphone head 3 coordinates are (R, 0). ), we need to calculate the sound source coordinates as (x, y);
  • the sound source always comes from the front, and the negative sign can be omitted, and becomes:
  • Step 204 According to the sound orientation calculated in the previous step, control the camera rotation to align the step 205, and use the face recognition technology to find the face position in the image taken by the camera, as follows:
  • the second step is to binarize the image through the skin color model, that is, the non-skinning portion is set to 0, and the skin color portion is set to 1, wherein the skin color value range can be obtained by statistical learning in the actual device;
  • the third step calls the corrosion expansion algorithm to filter
  • the fourth step uses the Unicom area detection to determine the position of the face with the width of the Unicom area in accordance with the size of the face being greater than or equal to the size of the face.
  • Step 206 Rotate the camera in the direction of the face until it is aimed at the face.
  • FIG. 3 is a schematic diagram of a position where the array microphone is placed directly in front of the camera according to a second embodiment of the present invention.
  • the array microphone has multiple physical forms, and the preferred embodiment is a A circular array of microphones with at least 3 microphones on it.
  • the camera is not placed with the array mic, and the array mic is placed directly in front of the camera.
  • the steps of the preferred embodiment are as follows:
  • Step 301 Receive audio data by using multiple microphones of the array microphone, filter the background noise, send it to the processing center, or send it to the processing center to filter out noise.
  • Step 302 The processing center extracts and separates the vocal parts in the multiplexed audio data according to the frequency, and then calculates the vocal time difference received by the plurality of mics according to the phase difference of the vocal part in the multiplexed audio data.
  • Step 303 Calculate the distance difference according to the time difference received by the plurality of microphones multiplied by the speed of sound, and then calculate the position of the sound according to the distance difference between the three microphones.
  • FIG. 4 is a position coordinate diagram of an array microphone and a sound source according to a second embodiment of a control method according to an embodiment of the present invention.
  • the direct distance of the array microphone microphone head is a known distance, and we set For R, we mark the center of the array as the coordinate origin, the coordinates of the microphone 1 are (-R, 0), the coordinates of the microphone 2 are (0, R), and the coordinates of the microphone 3 are (R, 0).
  • R we need to calculate The obtained sound source coordinates are
  • the distances from the sound source to the microphone 1, the microphone 2, and the microphone 3 are respectively LI, L2, L3.
  • the time difference measured in the previous step is multiplied by the speed of sound, which is the difference between L1, L2, and L3. That is to say, the values of L1-L3 L2-L3 are known, we mark that the known L1-L3 is D13, and L2-L3 is D23;
  • Step 304 Align the camera angle with the direction indicated by the angle arctan((d+y)/x).
  • FIG. 5 is a schematic diagram of training according to a second embodiment of a control method according to an embodiment of the present invention.
  • the speaker is not standing in front of the camera during training, that is, a cannot be in FIG.
  • the camera is rotated to the speaker, and the camera measures the angle 1).
  • Step 305 In the image taken by the camera, use the face recognition technology to find the face position, as follows:
  • the first step is to input the YUV data collected
  • the image is binarized by the skin color model, that is, the non-skinning portion is set to 0, and the skin color portion is set to 1, wherein the skin color value range can be obtained by statistical learning in the actual device;
  • the third step calls the corrosion expansion algorithm to filter
  • the fourth step uses the Unicom area detection to make the Unicom area width match the face size and the height is greater than or equal to the face.
  • the size is the standard, and the position of the face is judged.
  • Step 306 then rotate the camera in the direction of the face until it is aimed at the face.
  • FIG. 6 is a schematic diagram of a preferred embodiment of a control method according to an embodiment of the present invention.
  • the array microphone has various physical forms.
  • the preferred embodiment is a circular array microphone, and the above includes at least 3 A microphone.
  • the camera is not placed with the array mic, and the array mic is placed in front of the camera with horizontal displacement.
  • the position of the sound source is (x, y), and the coordinates of the camera relative to the array are (1, -d).
  • the steps of the preferred embodiment are as follows:
  • Step 401 obtaining x, y in a similar manner to step 301 303 in the second preferred embodiment.
  • the digital microphone of the venue and the camera are fixed in position, they will not move, and d and 1 can be obtained by learning and training.
  • Step 403 In the image taken by the camera, use the face recognition technology to find the face position, as follows:
  • the first step is to input the YUV data collected
  • the image is binarized by the skin color model, that is, the non-skinning portion is set to 0, and the skin color portion is set to 1, wherein the skin color value range can be obtained by statistical learning in the actual device;
  • the third step calls the corrosion expansion algorithm to filter
  • the fourth step uses the Unicom area detection to determine the position of the face with the width of the Unicom area in accordance with the size of the face being greater than or equal to the size of the face.
  • Step 404 then rotate the camera in the direction of the face until it is aimed at the face.
  • the embodiment of the invention further provides a control device, the control device comprising:
  • An obtaining module configured to acquire audio data including sound information of a target sound source
  • a determining module configured to determine location range information of the target sound source according to the audio data
  • a control module configured to control, according to the location range information, a rotation of a photographing device that cannot currently capture the target sound source, so that the photographing device can capture the target sound source.
  • the position range information of the target sound source is determined according to the position range information, and the rotation of the photographing device that cannot currently capture the target sound source is controlled according to the position range information, so that the photographing device can A target sound source is captured, thereby enabling the shooting device to capture a target sound source that is outside the original screen range
  • the location range information is direction information of the target sound source relative to the photographing device
  • the control module includes:
  • a first determining unit configured to determine a rotation control parameter of the photographing device corresponding to the direction information
  • control unit configured to control rotation of the photographing device according to the rotation control parameter, so that the photographing device can capture the target sound source.
  • the audio data is collected by a sound collection device
  • the determining module includes: a second determining unit, configured to determine, according to the audio data, the target sound source relative to the sound collection device Bearing information
  • a third determining unit configured to determine the direction information according to the orientation information.
  • the third determining unit includes:
  • Determining a subunit configured to determine the direction information according to the orientation information, and a preset correspondence relationship between the orientation information and the direction information.
  • An embodiment of the present invention further provides a control device, where the control device includes the control device described above.
  • the above solution supports the shooting device to capture a target sound source that is outside the original screen range.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Otolaryngology (AREA)
  • Studio Devices (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

一种控制方法、控制装置及控制设备。所述控制方法包括:获取包含目标声音源的声音信息的音频数据;根据所述音频数据确定所述目标声音源的位置范围信息;根据所述位置范围信息控制当前无法拍摄到所述目标声音源的拍摄设备的转动,使得所述拍摄设备能够拍摄到所述目标声音源。本发明实施例支持拍摄设备能够拍摄到处于原屏幕范围外的目标声音源。

Description

一种控制方法、 控制装置及控制设备
技术领域
本发明实施例涉及图像跟踪领域, 尤其涉及一种控制方法、 控制装置及 控制设备。
背景技术
视频通信过程中, 需要将摄像头对准演讲者。 现有的解决方案是使用图 像识别技术识别出人脸, 然后遥控摄像头对准人脸位置, 但此方案无法跟踪 超出屏幕范围外的演讲者或者处于屏幕范围外的另一演讲者。 发明内容
有鉴于此, 本发明实施例的目的是提供一种控制方法、 控制装置及控制 设备, 以支持拍摄设备能够拍摄到处于原屏幕范围外的目标声音源。
为解决上述技术问题, 本发明实施例提供方案如下:
本发明实施例提供一种控制方法, 所述控制方法包括:
获取包含目标声音源的声音信息的音频数据;
根据所述音频数据确定所述目标声音源的位置范围信息;
根据所述位置范围信息控制当前无法拍摄到所述目标声音源的拍摄设备 的转动 , 使得所述拍摄设备能够拍摄到所述目标声音源。
优选的, 所述位置范围信息为所述目标声音源相对于所述拍摄设备的方 向信息, 所述 居所述位置范围信息控制当前无法拍摄到所述目标声音源的 拍摄设备的转动为:
确定所述方向信息对应的所述拍摄设备的转动控制参数;
根据所述转动控制参数控制所述拍摄设备的转动。
优选的, 所述音频数据由一声音釆集设备釆集到, 所述根据所述音频数 据确定所述目标声音源的位置范围信息为:
根据所述音频数据确定所述目标声音源相对于所述声音釆集设备的方位 信息;
才艮据所述方位信息确定所述方向信息。
优选的, 所述 居所述方位信息确定所述方向信息为:
根据所述方位信息, 以及所述方位信息与所述方向信息的预设对应关系 确定所述方向信息。
优选的, 所述声音釆集设备用于确定所述方位信息的一预设平面和所述 预设平面上的一预设参考点, 所述拍摄设备在所述预设平面上对应一第一对 应点, 所述目标声音源在所述预设平面上对应一第二对应点,
所述方位信息为所述第二对应点相对于所述预设参考点的位置坐标, 所 述方向信息为所述第二对应点相对于所述第一对应点的方向信息表征坐标, 所述对应关系为以所述预设平面上的发声对应点相对于所述预设参考点 的位置坐标为自变量、 以所述第一对应点相对于所述预设参考点的位置坐标 为参数、 以所述发声对应点相对于所述第一对应点的方向信息表征坐标为变 量的平面几何函数。
优选的, 所述第一对应点相对于所述预设参考点的位置坐标为属于所述 预设平面的以所述预设参考点为第一原点的直角坐标上的坐标(al , a2 ) , 所述发声对应点相对于所述预设参考点的位置坐标为所述直角坐标上的坐标 ( X , y ) , y大于 a2 , 所述方向信息表征坐标为属于所述预设平面的以所述 第一对应点为第二原点的极坐标上的角度坐标 b ,
当 a2为 0时,所述极坐标的极轴与所述直角坐标的 X轴的方向相同; 当 a2不为 0时, 所述极坐标的极轴与所述直角坐标的 X轴平行且方向相同, 所述平面几何函数为 b=arctan((y- a2)/(x- al)),其中, x不等于 al ;或者, 所述平面几何函数为: 当 X不等于 al时, b=arctan((y- a2)/(x- al)); 当 x 等于 al时, b=90度。
优选的, 所述参数为根据通过学习训练方式得到的至少一个训练声音源 在所述预设平面上对应的训练点相对于所述预设参考点的位置坐标和相对于 所述第一对应点的方向信息表征坐标所确定。
本发明实施例提供一种控制装置, 所述控制装置包括:
获取模块, 设置为: 获取包含目标声音源的声音信息的音频数据; 确定模块, 设置为: 根据所述音频数据确定所述目标声音源的位置范围 信息;
控制模块, 设置为: 根据所述位置范围信息控制当前无法拍摄到所述目 标声音源的拍摄设备的转动,使得所述拍摄设备能够拍摄到所述目标声音源。
优选的, 所述位置范围信息为所述目标声音源相对于所述拍摄设备的方 向信息, 所述控制模块包括:
第一确定单元, 设置为: 确定所述方向信息对应的所述拍摄设备的转动 控制参数;
控制单元, 设置为: 根据所述转动控制参数控制所述拍摄设备的转动, 使得所述拍摄设备能够拍摄到所述目标声音源。
优选的, 所述音频数据由一声音釆集设备釆集到, 所述确定模块包括: 第二确定单元, 设置为: 根据所述音频数据确定所述目标声音源相对于 所述声音釆集设备的方位信息;
第三确定单元, 设置为: 根据所述方位信息确定所述方向信息。
优选的, 所述第三确定单元包括:
确定子单元, 设置为: 根据所述方位信息, 以及所述方位信息与所述方 向信息的预设对应关系确定所述方向信息。
本发明实施例提供一种包括以上所述的控制装置的控制设备。
从以上所述可以看出, 本发明实施例提供的控制方法、 控制装置及控制 设备至少包括如下技术效果:
通过获取包含目标声音源的声音信息的音频数据, 据此确定目标声音源 的位置范围信息, 并根据该位置范围信息控制当前无法拍摄到目标声音源的 拍摄设备的转动, 使得拍摄设备能够拍摄到目标声音源, 从而支持拍摄设备 能够拍摄到处于原屏幕范围外的目标声音源。 附图概述
图 1为本发明实施例提供的一种控制方法的流程图;
图 2为本发明实施例提供的一种控制方法的较佳实施例一的阵列麦克与 声音源的位置坐标图;
图 3为本发明实施例提供的一种控制方法的较佳实施例二的阵列麦克放 置在摄像头正前方的位置示意图
图 4为本发明实施例提供的一种控制方法的较佳实施例二的阵列麦克与 声音源的位置坐标图;
图 5为本发明实施例提供的一种控制方法的较佳实施例二的训练示意图; 图 6为本发明实施例提供的一种控制方法的较佳实施例三的示意图。 本发明的较佳实施方式
下面将结合附图及具体实施例对本发明实施例进行详细描述。
图 1为本发明实施例提供的一种控制方法的流程图, 参照图 1 , 本发明 实施例提供一种控制方法, 所述控制方法包括如下步骤:
步骤 101 , 获取包含目标声音源的声音信息的音频数据;
步骤 102, 根据所述音频数据确定所述目标声音源的位置范围信息; 步骤 103 , 根据所述位置范围信息控制当前无法拍摄到所述目标声音源 的拍摄设备的转动, 使得所述拍摄设备能够拍摄到所述目标声音源。
可见, 通过获取包含目标声音源的声音信息的音频数据, 据此确定目标 声音源的位置范围信息, 并根据该位置范围信息控制当前无法拍摄到目标声 音源的拍摄设备的转动, 使得拍摄设备能够拍摄到目标声音源, 从而支持拍 摄设备能够拍摄到处于原屏幕范围外的目标声音源。
显然, 所述目标声音源应处于所述拍摄设备通过转动可以达到的全部可 拍摄范围内。
所述目标声音源可以为说话的人, 也可以为发声设备。
所述拍摄设备可以为相机或摄像头。
具体地, 例如: 所述声音信息中可以包含预设的表示所述位置范围信息 的关键字内容, 则通过语音识别技术就可以根据所述音频数据确定所述位置 范围信息。
或者, 例如: 所述位置范围信息可以为所述目标声音源相对于所述拍摄 设备的方向信息, 所述根据所述位置范围信息控制当前无法拍摄到所述目标 声音源的拍摄设备的转动具体可以为:
确定所述方向信息对应的所述拍摄设备的转动控制参数;
根据所述转动控制参数控制所述拍摄设备的转动。
其中, 所述转动控制参数, 例如: 所述拍摄设备在若干可调整角度中的 某一角度的标识, 摄像头的云台控制器的旋转角度, 摄像头的光轴的方向参 数, 等等。
具体地, 所述音频数据可以由一声音釆集设备釆集到, 所述根据所述音 频数据确定所述目标声音源的位置范围信息具体可以为:
根据所述音频数据确定所述目标声音源相对于所述声音釆集设备的方位 信息;
才艮据所述方位信息确定所述方向信息。
其中, 所述声音釆集设备例如, 阵列麦克。
所述方位信息可以为方向或位置信息。
进一步地, 所述根据所述方位信息确定所述方向信息具体可以为: 根据所述方位信息, 以及所述方位信息与所述方向信息的预设对应关系 确定所述方向信息。
具体地, 例如, 通过足够多个点的训练来得到足够多的方位信息和方向 信息的组合,并通过对这些组合进行拟合而得到所述对应关系。例如,以 0.1m 为分布粒度来放置或移动训练声音源。
又例如, 按照特定位置关系放置所述拍摄设备和所述声音釆集设备, 使 得所述目标发声源处于任一位置时所述方位信息与所述方向信息所表示的方 向对应一致; 基于该特定位置关系确定所述对应关系。 比如, 可以将所述拍 摄设备和所述声音釆集设备放置在一起, 或者, 所述声音釆集设备水平放置 时, 所述拍摄设备放置在所述声音釆集设备的正上方。 下面的较佳实施例一 中就釆用了放置在一起的方式。
需要说明的是, 考虑到实际中所述拍摄设备在放置位置上的限制, 所述 拍摄设备放置位置可以允许一定的偏离, 由于所述拍摄设备在同一时刻能够 拍摄到的是一个较宽的范围, 只要该偏离能使所述拍摄设备能够拍摄到所述 方位信息所表示的方向即可, 这可以在工程实践中实地操作实现, 在此不再 赘述。
又例如, 所述声音釆集设备用于确定所述方位信息的一预设平面和所述 预设平面上的一预设参考点, 所述拍摄设备在所述预设平面上对应一第一对 应点, 所述目标声音源在所述预设平面上对应一第二对应点,
所述方位信息为所述第二对应点相对于所述预设参考点的位置坐标, 所 述方向信息为所述第二对应点相对于所述第一对应点的方向信息表征值, 所述对应关系为以所述预设平面上的发声对应点相对于所述预设参考点 的位置坐标为自变量、 以所述第一对应点相对于所述预设参考点的位置坐标 为参数、 以所述发声对应点相对于所述第一对应点的方向信息表征值为变量 的平面几何函数。
其中, 所述第一对应点, 比如, 所述拍摄设备的光心或者所述拍摄设备 的光心在所述预设平面的投影。
所述第二对应点, 比如, 所述目标声音源在所述预设平面的某一点或者 所述目标声音源不在所述预设平面的某一点在所述预设平面的投影。
所述发声对应点, 比如, 声音源在所述预设平面的发声参考点或者声音 源不在所述预设平面的发声参考点在所述预设平面的投影。 其中, 发声参考 点可以为人的喉咙的某点或声音源的声音输出单元的某点。
所述方向信息表征值, 比如, 以所述第二对应点为原点, 所述预设平面 上以该原点为中心有一轴坐标系时, 所述发声对应点在该轴坐标系的角度坐 标值。
所述声音釆集设备所对应的预设平面和预设参考点与所述声音釆集设备 具体釆用哪种设备有关,比如平面阵列麦克所釆用的定位平面和定位参考点。
需要说明的是, 在实际应用中, 声音源的位置可以在所述预设平面内, 也可以在所述预设平面的某一侧, 且由于其它因素的影响, 所得到的方位信 息可能有很小的误差, 但是, 由于拍摄设备在同一时刻所能拍摄到的是一个 较宽的范围, 由此, 这种误差并不影响本发明实施例所要解决的技术问题的 解决。
这里给出所述平面几何函数的具体例子: 所述第一对应点相对于所述预 设参考点的位置坐标为属于所述预设平面的以所述预设参考点为第一原点的 直角坐标上的坐标(al , a2 ) , 所述发声对应点相对于所述预设参考点的位 置坐标为所述直角坐标上的坐标(X, y ) , y大于 a2 , 所述方向信息表征坐 标为属于所述预设平面的以所述第一对应点为第二原点的极坐标上的角度坐 标 b,
当 a2为 0时,所述极坐标的极轴与所述直角坐标的 X轴的方向相同; 当 a2不为 0时, 所述极坐标的极轴与所述直角坐标的 X轴平行且方向相同, 所述平面几何函数为 b=arctan((y- a2)/(x- al)),其中, x不等于 al ;或者, 所述平面几何函数为: 当 X不等于 al时, b=arctan((y- a2)/(x- al)); 当 x 等于 al时, b=90度。
所述参数可以通过工程实施过程中的实地测算得到; 或者, 所述参数可 以为根据通过学习训练方式得到的至少一个训练声音源在所述预设平面上对 应的训练点相对于所述预设参考点的位置坐标和相对于所述第一对应点的方 向信息表征值所确定。
所述学习训练方式例如:
确定一第一声音源在所述预设平面上对应的第一训练点相对于所述预设 参考点的第一位置坐标和相对于所述第一对应点的第一方向信息表征值; 根据所述第一位置坐标和所述第一方向信息表征值得到所述参数; 其中, 所述第一训练点、 所述第一对应点和所述预设参考点不共线。 下面的较佳实施例二中的学习训练方式就是釆用了这种学习训练方式。 所述学习训练方式又例如:
确定一第二声音源在所述预设平面上对应的第二训练点相对于所述预设 参考点的第二位置坐标和相对于所述第一对应点的第二方向信息表征值; 确定一第三声音源在所述预设平面上对应的第三训练点相对于所述预设 参考点的第三位置坐标和相对于所述第一对应点的第三方向信息表征值; 根据所述第二位置坐标、所述第二方向信息表征值、所述第三位置坐标、 所述第三方向信息表征值得到所述参数;
其中, 所述第二训练点、 所述第三训练点和所述第一对应点不共线。 下面的较佳实施例三中的学习训练方式就是釆用了这种学习训练方式。 为了对上述控制方法进一步阐述明白, 以下给出所述控制方法的三个较 佳实施例:
较佳实施例一:
图 2为本发明实施例提供的一种控制方法的较佳实施例一的阵列麦克与 声音源的位置坐标图, 参照图 2, 阵列麦克有多种物理形态, 本较佳实施例 为一线性阵列麦克, 上面包含至少 3个咪头。 同时, 摄像头与阵列麦克放置 在一起。 本较佳实施例的步骤如下:
步骤 201 , 使用阵列麦克的多个咪头分别接收到音频数据, 滤除背景噪 声后发给处理中心, 或发给处理中心后滤除噪声。
步骤 202. 处理中心根据频率将多路音频数据中的人声部分提取分离,然 后根据多路音频数据中人声部分的相位差计算出多个咪头收到的人声时间差。
步骤 203 , 根据多个咪头收到的时间差乘以音速可以计算出距离差, 再 根据三个咪头之间的距离差可以计算出声音的方位。
具体地, 阵列麦克咪头直接距离为已知距离, 我们设定为 R, 我们标注 咪头 2为坐标原点, 咪头 1坐标为 (-R,0), 咪头 3坐标为 (R,0), 我们需要计算 得到的声音源坐标为 (x,y);
我们标注声音源到达咪头 1、咪头 2、咪头 3的距离分别为 Ll、 L2、 L3 , 实际我们上一步测得的时间差乘以音速, 为 Ll、 L2、 L3之间的差值, 也就 是说 L1-L3 L2-L3的值已知,我们标注已知的 L1-L3为 D13 , L2-L3为 D23;
根据勾股定理得出:
LI = ^(x + R)2 + y2 = -jx2 + y2 + R2 + 2xR
L2 =」x2 + y2
L3 = l(x - R)2 + y2 = -]x2 + y2 + R2 - 2xR = L\ = L3 = jx2 + y1 + R2 + 2xR - ^x2 + y2 + R2 - 2xR
Figure imgf000009_0001
= jlx2 + 2y2 + 2R2― 2-yJx4 + y4 + R4 + 2x2y2 + 2x2R2 + 2y2R2 - 4x2R
平方后得: D\32 = 2x2 + 2y2 + 2R2― 2^4 + y4 + R4 + 2x2y2 - 2x2R2 + 2y2R2
x4 +y4 + R4 + 2x2y2― 2x2R2 + 2y2R2
= x2 + 2y2 + R2 -0.5*Ο\32
平方后得:
x4+y4+R4+ 2x2y2 - 2x2R2 + 2y2R2 = x4 + + (R2 -0.5 * D\32 + 2x2y2
+ 2x2(R2 -0.5*D132)+2 (R2 -0.5*D132)
展开后得:
x4+y4+R4 + 2x2y2 - 2x2R2 + 2y2R2 = x4 + y4 + R4 -R2D\32 + 0.25 134 + 2x2y2 + 2x2R2 -x2D\32 +2y2R2 -y2D\32
左右消除后得:
y2D\ 32 = -R2D\ 32 + 0.25D134 + x2R2 - x2D\ 32
Figure imgf000010_0001
在本较佳实施例的实际应用场景中, 声音源永远来自前方, 此时负号可 以省略, 变为:
4R
y = jc + 0.25*7)13 -R 公式 A
7)13 同时, 我们还需要满足:
D23 = L2-L3 = Χ2 +y2 -」x2 + y2 + R2 - 2xR
即:
D23 = Χ2 +y2 -■Jx2 + y2 +R2 - 2xR 公式 B
使用软件程序可以轻松得出同时满足公式 A与公式 B的 x,y, 具体为: 根据 D13的正负判断 X的正负,再以 X为循环变量,循环使用公式 A得到 y, 再使用该 x,y代入公式 B直到公式 B成立,此时得到的 x,y即为声音源位置。 声音源的角度则为 tan- ^。 步骤 204, 根据上一步计算得到的声音方位, 控制摄像头转动对准该方 步骤 205, 在摄像头摄取的图像中, 使用人脸识别技术找到人脸位置, 具体如下: 第二步通过肤色模型二值化图像, 即将非肤色部分置为 0, 将肤色部分 置为 1 , 其中肤色取值范围可以通过在实际设备中统计学习得到;
第三步调用腐蚀膨胀算法滤波;
第四步使用联通区域检测, 以联通区域宽符合人脸大小高大于等于人脸 大小为标准, 判断出人脸位置。
步骤 206, 将摄像头向人脸方向转动直至对准人脸。
较佳实施例二:
图 3为本发明实施例提供的一种控制方法的较佳实施例二的阵列麦克放 置在摄像头正前方的位置示意图, 参照图 3 , 阵列麦克有多种物理形态, 本 较佳实施例为一圓形阵列麦克, 上面包含至少 3个咪头。 同时, 摄像头没有 与阵列麦克放置在一起, 阵列麦克放置在摄像头正前方。 本较佳实施例的步 骤如下:
步骤 301 , 使用阵列麦克的多个咪头分别接收到音频数据, 滤除背景噪 声后发给处理中心, 或发给处理中心后滤除噪声。
步骤 302, 处理中心根据频率将多路音频数据中的人声部分提取分离, 然后根据多路音频数据中人声部分的相位差计算出多个咪头收到的人声时间 差。
步骤 303 , 根据多个咪头收到的时间差乘以音速可以计算出距离差, 再 根据三个咪头之间的距离差可以计算出声音的方位。
具体地, 图 4为本发明实施例提供的一种控制方法的较佳实施例二的阵 列麦克与声音源的位置坐标图, 参照图 4, 阵列麦克咪头直接距离为已知距 离,我们设定为 R,我们标注阵列麦克中心为坐标原点,咪头 1坐标为 (-R,0), 咪头 2坐标为 (0,R), 咪头 3坐标为 (R,0), 我们需要计算得到的声音源坐标为
(x,y);
我们标注声音源到达咪头 1、咪头 2、咪头 3的距离分别为 LI、 L2、 L3 , 实际我们上一步测得的时间差乘以音速, 为 Ll、 L2、 L3之间的差值, 也就 是说 L1-L3 L2-L3的值已知,我们标注已知的 L1-L3为 D13 , L2-L3为 D23;
根据勾股定理得出:
LI = ^(x + R)2 + y2 = -jx2 + y2 + R2 + 2xR L2 = ^x2 + (y-R)2 = -]x2 + y2 +R2 - 2yR
L3 = -^(x - R)2 + y2 = -]x2 + y2 +R2 - 2xR
D\3 = L\-L3 = -jx2 + y2 + R2 + 2xR― x2 + y2 + R
推导得出公式 A: 公式 A
Figure imgf000012_0001
同时, 我们还需要满足:
D23 = L2-L3 = 2 + y2 + R2 - 2yR― x2 + y2 + R2 - 2xR
即:
D23 = 2 + y2 + R2- 2yR― ^x2 + y2 + R2 - 2xR 公式 C
使用软件程序可以轻松得出同时满足公式 A与公式 C的 x,y, 具体为: 根据 D13的正负判断 X的正负,再以 X为循环变量,循环使用公式 A得到 y, 再使用该 x、 y代入公式 C直到公式 C成立, 此时得到的 x、 y即为声音源位 置。
步骤 304, 将摄像头角度对准角度 arctan((d+y)/x)表示的方向。
其中, 在实际使用场景, 由于会场的数字麦克与摄像头位置固定, 不会 挪动, d可以釆用学习训练方式得到。 具体地, 图 5为本发明实施例提供的 一种控制方法的较佳实施例二的训练示意图, 参照图 5, 训练时让演讲者不 站在摄像头正前方,也就是图 5中 a不能为 90度, 然后将摄像头转动对准演 讲者, 摄像头测得角度1)。 演讲者说话后, 使用前面的步骤得到了 x、 y坐标 值, 则可通过计算得到摄像机与阵列麦克之间的距离 d, 计算公式为: d=x/tan(b)-y„
步骤 305, 在摄像头摄取的图像中, 使用人脸识别技术找到人脸位置, 方法如下:
第一步输入釆集到的 YUV数据;
第二步通过肤色模型二值化图像, 即将非肤色部分置为 0, 将肤色部分 置为 1, 其中肤色取值范围可以通过在实际设备中统计学习得到;
第三步调用腐蚀膨胀算法滤波
第四步使用联通区域检测, 以联通区域宽符合人脸大小高大于等于人脸 大小为标准, 判断出人脸位置
步骤 306, 然后将摄像头向人脸方向转动直至对准人脸。
较佳实施例三:
图 6为本发明实施例提供的一种控制方法的较佳实施例三的示意图, 参 照图 6, 阵列麦克有多种物理形态, 本较佳实施例为一圓形阵列麦克, 上面 包含至少 3个咪头。 摄像头没有与阵列麦克放置在一起, 阵列麦克放置在摄 像头前方并有水平方向的位移。 声音源位置坐标为 (x、 y ) , 摄像头相对阵 列麦克的坐标为 (1, -d ) 。 本较佳实施例的步骤如下:
步骤 401 ,通过与较佳实施例二中的步骤 301 303类似的方式得到 x、y。 步骤 402, 将摄像头角度对准角度 b表示的方向, b=arctan((y+d)/(x-l))。 其中, 在实际使用场景, 由于会场的数字麦克与摄像头位置固定, 不会 挪动, d和 1可以釆用学习训练方式得到。 具体地, 首先, 训练者站到摄像 头正前方讲话,阵列麦克计算出坐标 (xl,yl),则摄像头的横坐标 l=xl ;然后, 训练者站到摄像头非正前方讲话, 操作人员控制摄像头对准训练者, 此时摄 像头自己统计到角度为 b2 ; 阵列麦克计算出坐标 (x2,y2) , 则 tan(b2) = (y2+d)/(x2-l) , 而 l=xl , 则 tan(b2)=(y2+d)/(x2-xl) , 由 此可 算 出 d=tan(b2)*(x2-xl)-y2。
步骤 403 , 在摄像头摄取的图像中, 使用人脸识别技术找到人脸位置, 方法如下:
第一步输入釆集到的 YUV数据;
第二步通过肤色模型二值化图像, 即将非肤色部分置为 0, 将肤色部分 置为 1 , 其中肤色取值范围可以通过在实际设备中统计学习得到;
第三步调用腐蚀膨胀算法滤波
第四步使用联通区域检测, 以联通区域宽符合人脸大小高大于等于人脸 大小为标准, 判断出人脸位置
步骤 404, 然后将摄像头向人脸方向转动直至对准人脸。
本发明实施例还提供一种控制装置, 所述控制装置包括:
获取模块, 用于获取包含目标声音源的声音信息的音频数据;
确定模块 ,用于根据所述音频数据确定所述目标声音源的位置范围信息; 控制模块, 用于根据所述位置范围信息控制当前无法拍摄到所述目标声 音源的拍摄设备的转动, 使得所述拍摄设备能够拍摄到所述目标声音源。
可见, 通过获取包含目标声音源的声音信息的音频数据, 据此确定目标 声音源的位置范围信息, 并根据该位置范围信息控制当前无法拍摄到目标声 音源的拍摄设备的转动, 使得拍摄设备能够拍摄到目标声音源, 从而支持拍 摄设备能够拍摄到处于原屏幕范围外的目标声音源
进一步地, 所述位置范围信息为所述目标声音源相对于所述拍摄设备的 方向信息, 所述控制模块包括:
第一确定单元, 用于确定所述方向信息对应的所述拍摄设备的转动控制 参数;
控制单元, 用于根据所述转动控制参数控制所述拍摄设备的转动, 使得 所述拍摄设备能够拍摄到所述目标声音源。
进一步地,所述音频数据由一声音釆集设备釆集到,所述确定模块包括: 第二确定单元, 用于根据所述音频数据确定所述目标声音源相对于所述 声音釆集设备的方位信息;
第三确定单元, 用于根据所述方位信息确定所述方向信息。
进一步地, 所述第三确定单元包括:
确定子单元, 用于根据所述方位信息, 以及所述方位信息与所述方向信 息的预设对应关系确定所述方向信息。
本发明实施例还提供一种控制设备, 所述控制设备包括以上所述的控制 装置。
以上所述仅是本发明实施例的实施方式, 应当指出, 对于本技术领域的 普通技术人员来说, 在不脱离本发明实施例原理的前提下, 还可以作出若干 改进和润饰, 这些改进和润饰也应视为本发明实施例的保护范围。
工业实用性
上述方案支持拍摄设备能够拍摄到处于原屏幕范围外的目标声音源。

Claims

权 利 要 求 书
1. 一种控制方法, 所述控制方法包括:
获取包含目标声音源的声音信息的音频数据;
根据所述音频数据确定所述目标声音源的位置范围信息;
根据所述位置范围信息控制当前无法拍摄到所述目标声音源的拍摄设备 的转动 , 使得所述拍摄设备能够拍摄到所述目标声音源。
2. 如权利要求 1所述的控制方法, 其中, 所述位置范围信息为所述目标 声音源相对于所述拍摄设备的方向信息,
所述根据所述位置范围信息控制当前无法拍摄到所述目标声音源的拍摄 设备的转动为:
确定所述方向信息对应的所述拍摄设备的转动控制参数;
根据所述转动控制参数控制所述拍摄设备的转动。
3. 如权利要求 2所述的控制方法, 其中, 所述音频数据由一声音釆集设 备釆集到, 所述根据所述音频数据确定所述目标声音源的位置范围信息为: 根据所述音频数据确定所述目标声音源相对于所述声音釆集设备的方位 信息;
才艮据所述方位信息确定所述方向信息。
4. 如权利要求 3所述的控制方法, 其中, 所述根据所述方位信息确定所 述方向信息为:
根据所述方位信息, 以及所述方位信息与所述方向信息的预设对应关系 确定所述方向信息。
5. 如权利要求 4所述的控制方法, 其中, 所述声音釆集设备用于确定所 述方位信息的一预设平面和所述预设平面上的一预设参考点, 所述拍摄设备 在所述预设平面上对应一第一对应点, 所述目标声音源在所述预设平面上对 应一第二对应点,
所述方位信息为所述第二对应点相对于所述预设参考点的位置坐标, 所 述方向信息为所述第二对应点相对于所述第一对应点的方向信息表征坐标, 所述对应关系为以所述预设平面上的发声对应点相对于所述预设参考点 的位置坐标为自变量、 以所述第一对应点相对于所述预设参考点的位置坐标 为参数、 以所述发声对应点相对于所述第一对应点的方向信息表征坐标为变 量的平面几何函数。
6. 如权利要求 5所述的控制方法, 其中, 所述第一对应点相对于所述预 设参考点的位置坐标为属于所述预设平面的以所述预设参考点为第一原点的 直角坐标上的坐标(al , a2 ) , 所述发声对应点相对于所述预设参考点的位 置坐标为所述直角坐标上的坐标(X, y ) , y大于 a2 , 所述方向信息表征坐 标为属于所述预设平面的以所述第一对应点为第二原点的极坐标上的角度坐 标 b,
当 a2为 0时,所述极坐标的极轴与所述直角坐标的 X轴的方向相同; 当 a2不为 0时, 所述极坐标的极轴与所述直角坐标的 X轴平行且方向相同, 所述平面几何函数为 b=arctan((y- a2)/(x- al)),其中, x不等于 al ;或者, 所述平面几何函数为: 当 X不等于 al时, b=arctan((y- a2)/(x- al》; 当 x 等于 al时, b=90度。
7. 如权利要求 5所述的控制方法, 其中, 所述参数为根据通过学习训练 方式得到的至少一个训练声音源在所述预设平面上对应的训练点相对于所述 预设参考点的位置坐标和相对于所述第一对应点的方向信息表征坐标所确定。
8. 一种控制装置, 所述控制装置包括:
获取模块, 设置为: 获取包含目标声音源的声音信息的音频数据; 确定模块, 设置为: 根据所述音频数据确定所述目标声音源的位置范围 信息;
控制模块, 设置为: 根据所述位置范围信息控制当前无法拍摄到所述目 标声音源的拍摄设备的转动,使得所述拍摄设备能够拍摄到所述目标声音源。
9. 如权利要求 8所述的控制装置, 其中, 所述位置范围信息为所述目标 声音源相对于所述拍摄设备的方向信息, 所述控制模块包括:
第一确定单元, 设置为: 确定所述方向信息对应的所述拍摄设备的转动 控制参数;
控制单元, 设置为: 根据所述转动控制参数控制所述拍摄设备的转动, 使得所述拍摄设备能够拍摄到所述目标声音源。
10. 如权利要求 9所述的控制装置, 其中, 所述音频数据由一声音釆集 设备釆集到, 所述确定模块包括:
第二确定单元, 设置为: 根据所述音频数据确定所述目标声音源相对于 所述声音釆集设备的方位信息;
第三确定单元, 设置为: 根据所述方位信息确定所述方向信息。
11. 如权利要求 10所述的控制装置, 其中, 所述第三确定单元包括: 确定子单元, 设置为: 根据所述方位信息, 以及所述方位信息与所述方 向信息的预设对应关系确定所述方向信息。
12. 一种控制设备,所述控制设备包括权利要求 8至 11中任一项所述的 控制装置。
PCT/CN2013/084558 2013-09-29 2013-09-29 一种控制方法、控制装置及控制设备 WO2015042897A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US14/426,480 US9591229B2 (en) 2013-09-29 2013-09-29 Image tracking control method, control device, and control equipment
EP13892080.6A EP2882180A4 (en) 2013-09-29 2013-09-29 CONTROL METHOD, CONTROL APPARATUS AND CONTROL DEVICE
PCT/CN2013/084558 WO2015042897A1 (zh) 2013-09-29 2013-09-29 一种控制方法、控制装置及控制设备

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2013/084558 WO2015042897A1 (zh) 2013-09-29 2013-09-29 一种控制方法、控制装置及控制设备

Publications (1)

Publication Number Publication Date
WO2015042897A1 true WO2015042897A1 (zh) 2015-04-02

Family

ID=52741839

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2013/084558 WO2015042897A1 (zh) 2013-09-29 2013-09-29 一种控制方法、控制装置及控制设备

Country Status (3)

Country Link
US (1) US9591229B2 (zh)
EP (1) EP2882180A4 (zh)
WO (1) WO2015042897A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105049709A (zh) * 2015-06-30 2015-11-11 广东欧珀移动通信有限公司 一种大视角摄像头控制方法及用户终端
WO2020087748A1 (zh) * 2018-10-29 2020-05-07 歌尔股份有限公司 一种音频设备定向显示方法、装置和音频设备
CN111271567A (zh) * 2020-03-18 2020-06-12 海信视像科技股份有限公司 电视挂架的控制方法及装置
CN111896119A (zh) * 2020-08-14 2020-11-06 北京声智科技有限公司 红外测温方法及电子设备

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017028608A (ja) * 2015-07-27 2017-02-02 株式会社リコー ビデオ会議端末機
WO2019089108A1 (en) * 2017-11-06 2019-05-09 Google Llc Methods and systems for attending to a presenting user
CN108737719A (zh) * 2018-04-04 2018-11-02 深圳市冠旭电子股份有限公司 摄像头拍摄控制方法、装置、智能设备及存储介质
CN110830708A (zh) * 2018-08-13 2020-02-21 深圳市冠旭电子股份有限公司 一种追踪摄像方法、装置及终端设备
CN111049973B (zh) * 2019-11-22 2021-06-01 维沃移动通信有限公司 一种屏幕显示的控制方法、电子设备及计算机可读存储介质
CN111683204A (zh) * 2020-06-18 2020-09-18 南方电网数字电网研究院有限公司 无人机拍摄方法、装置、计算机设备和存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101567969A (zh) * 2009-05-21 2009-10-28 上海交通大学 基于麦克风阵列声音制导的智能视频导播方法
CN102300043A (zh) * 2010-06-23 2011-12-28 中兴通讯股份有限公司 调整远程呈现会议系统的会场摄像头的方法及会议终端
CN103685906A (zh) * 2012-09-20 2014-03-26 中兴通讯股份有限公司 一种控制方法、控制装置及控制设备

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5778082A (en) * 1996-06-14 1998-07-07 Picturetel Corporation Method and apparatus for localization of an acoustic source
US6618073B1 (en) * 1998-11-06 2003-09-09 Vtel Corporation Apparatus and method for avoiding invalid camera positioning in a video conference
JP2004505560A (ja) * 2000-08-01 2004-02-19 コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ 音源への装置の照準化
US7450149B2 (en) * 2002-03-25 2008-11-11 Polycom, Inc. Conferencing system with integrated audio driver and network interface device
NO318096B1 (no) * 2003-05-08 2005-01-31 Tandberg Telecom As Arrangement og fremgangsmate for lokalisering av lydkilde
NO328582B1 (no) * 2006-12-29 2010-03-22 Tandberg Telecom As Mikrofon for lydkildesporing
US20100118112A1 (en) * 2008-11-13 2010-05-13 Polycom, Inc. Group table top videoconferencing device
US8395653B2 (en) 2010-05-18 2013-03-12 Polycom, Inc. Videoconferencing endpoint having multiple voice-tracking cameras
US8842161B2 (en) * 2010-05-18 2014-09-23 Polycom, Inc. Videoconferencing system having adjunct camera for auto-framing and tracking
US10939201B2 (en) * 2013-02-22 2021-03-02 Texas Instruments Incorporated Robust estimation of sound source localization

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101567969A (zh) * 2009-05-21 2009-10-28 上海交通大学 基于麦克风阵列声音制导的智能视频导播方法
CN102300043A (zh) * 2010-06-23 2011-12-28 中兴通讯股份有限公司 调整远程呈现会议系统的会场摄像头的方法及会议终端
CN103685906A (zh) * 2012-09-20 2014-03-26 中兴通讯股份有限公司 一种控制方法、控制装置及控制设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP2882180A4 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105049709A (zh) * 2015-06-30 2015-11-11 广东欧珀移动通信有限公司 一种大视角摄像头控制方法及用户终端
WO2020087748A1 (zh) * 2018-10-29 2020-05-07 歌尔股份有限公司 一种音频设备定向显示方法、装置和音频设备
US11670200B2 (en) 2018-10-29 2023-06-06 Goertek Inc. Orientated display method and apparatus for audio device, and audio device
CN111271567A (zh) * 2020-03-18 2020-06-12 海信视像科技股份有限公司 电视挂架的控制方法及装置
CN111271567B (zh) * 2020-03-18 2022-04-01 海信视像科技股份有限公司 电视挂架的控制方法及装置
CN111896119A (zh) * 2020-08-14 2020-11-06 北京声智科技有限公司 红外测温方法及电子设备

Also Published As

Publication number Publication date
US9591229B2 (en) 2017-03-07
US20160286133A1 (en) 2016-09-29
EP2882180A4 (en) 2015-10-14
EP2882180A1 (en) 2015-06-10

Similar Documents

Publication Publication Date Title
WO2015042897A1 (zh) 一种控制方法、控制装置及控制设备
CN103685906B (zh) 一种控制方法、控制装置及控制设备
US9955074B2 (en) Target tracking method and system for intelligent tracking high speed dome camera
US10498952B2 (en) Shooting method and shooting system capable of realizing dynamic capturing of human faces based on mobile terminal
CN111432115B (zh) 基于声音辅助定位的人脸追踪方法、终端及存储装置
WO2019128109A1 (zh) 一种基于人脸追踪的动向投影方法、装置及电子设备
JP2019186929A (ja) カメラ撮影制御方法、装置、インテリジェント装置および記憶媒体
US9690262B2 (en) Display device and method for regulating viewing angle of display device
CN105718862A (zh) 一种单摄像头教师自动跟踪方法、装置及录播系统
CN105611167B (zh) 一种对焦平面调整方法及电子设备
EP2993894B1 (en) Image capturing method and electronic apparatus
CN109982054B (zh) 一种基于定位追踪的投影方法、装置、投影仪及投影系统
CN106250839B (zh) 一种虹膜图像透视校正方法、装置和移动终端
WO2018112898A1 (zh) 一种投影方法、装置及机器人
CN101697105B (zh) 一种摄像式触摸检测定位方法及摄像式触摸检测系统
CN109492506A (zh) 图像处理方法、装置和系统
CN111683204A (zh) 无人机拍摄方法、装置、计算机设备和存储介质
CN111062234A (zh) 一种监控方法、智能终端及计算机可读存储介质
CN109410593A (zh) 一种鸣笛抓拍系统及方法
KR101111503B1 (ko) 전방향 피티지 카메라 제어 장치 및 그 방법
CN112839165B (zh) 人脸跟踪摄像的实现方法、装置、计算机设备和存储介质
JP2011217202A (ja) 画像取得装置
WO2018121730A1 (zh) 视频监控和人脸识别方法、装置及系统
KR20100121086A (ko) 음원인식을 이용한 촬영영상 추적 ptz 카메라 운용시스템 및 그 방법
CN112702513B (zh) 一种双光云台协同控制方法、装置、设备及存储介质

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 2013892080

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 14426480

Country of ref document: US

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13892080

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE