WO2018095166A1 - 一种设备控制方法、装置及系统 - Google Patents

一种设备控制方法、装置及系统 Download PDF

Info

Publication number
WO2018095166A1
WO2018095166A1 PCT/CN2017/106800 CN2017106800W WO2018095166A1 WO 2018095166 A1 WO2018095166 A1 WO 2018095166A1 CN 2017106800 W CN2017106800 W CN 2017106800W WO 2018095166 A1 WO2018095166 A1 WO 2018095166A1
Authority
WO
WIPO (PCT)
Prior art keywords
microphone
position point
coordinate information
sound source
preset position
Prior art date
Application number
PCT/CN2017/106800
Other languages
English (en)
French (fr)
Inventor
陈扬坤
何赛娟
陈展
Original Assignee
杭州海康威视数字技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 杭州海康威视数字技术股份有限公司 filed Critical 杭州海康威视数字技术股份有限公司
Priority to US16/462,453 priority Critical patent/US10816633B2/en
Priority to EP17874320.9A priority patent/EP3546976B1/en
Publication of WO2018095166A1 publication Critical patent/WO2018095166A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • H04N7/147Communication arrangements, e.g. identifying the communication as a video-communication, intermediate storage of the signals
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S3/00Direction-finders for determining the direction from which infrasonic, sonic, ultrasonic, or electromagnetic waves, or particle emission, not having a directional significance, are being received
    • G01S3/78Direction-finders for determining the direction from which infrasonic, sonic, ultrasonic, or electromagnetic waves, or particle emission, not having a directional significance, are being received using electromagnetic waves other than radio waves
    • G01S3/782Systems for determining direction or deviation from predetermined direction
    • G01S3/785Systems for determining direction or deviation from predetermined direction using adjustment of orientation of directivity characteristics of a detector or detector system to give a desired condition of signal derived from that detector or detector system
    • G01S3/786Systems for determining direction or deviation from predetermined direction using adjustment of orientation of directivity characteristics of a detector or detector system to give a desired condition of signal derived from that detector or detector system the desired condition being maintained automatically
    • G01S3/7864T.V. type tracking systems
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
    • G01S5/22Position of source determined by co-ordinating a plurality of position lines defined by path-difference measurements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42204User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/62Control of parameters via user interfaces
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/67Focus control based on electronic image sensor signals
    • H04N23/675Focus control based on electronic image sensor signals comprising setting of focusing regions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/695Control of camera direction for changing a field of view, e.g. pan, tilt or based on tracking of objects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/20Arrangements for obtaining desired frequency or directional characteristics
    • H04R1/32Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
    • H04R1/40Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers
    • H04R1/406Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42204User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor
    • H04N21/42206User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor characterized by hardware details

Definitions

  • the present application relates to the field of automatic control technologies, and in particular, to a device control method, apparatus and system.
  • the video capture device may be installed in a conference room or a classroom.
  • the pan-tilt camera capable of 360-degree rotation and up-and-down adjustment may be used to locate the sound source position by the video capture device, that is, the position of the person who is speaking, Further control is directed at the person who is speaking.
  • the known device control method mainly installs a microphone on a video capture device, and detects a sound source received by the microphone, thereby locating the sound source position, that is, the position of the person who is speaking, and then controlling the video capture device. Quasi-sound source location.
  • the microphone since the microphone is mounted on the video capture device, it is usually only possible to locate a sound source that is close to the video capture device, and cannot accurately locate a sound source that is far away from the video capture device. Therefore, in the above method, the sound source positioning accuracy is low, resulting in low device control accuracy.
  • the purpose of the embodiments of the present application is to provide a device control method, device, and system to improve the accuracy of device control.
  • the specific technical solutions are as follows:
  • an embodiment of the present application provides a device control method, which is applied to a sound source positioning system.
  • the sound source localization system further includes a microphone disposed outside the video capture device, and the method includes:
  • a location point where the voice power value is the largest is identified, the location point is determined as the sound source location, and the self is aligned to the sound source location.
  • the step of calculating the voice power value corresponding to each location point according to the voice signal collected by the microphones, the coordinate information of each microphone, and the coordinate information of each preset location point includes:
  • each preset position point For each preset position point, calculating a delay difference of the preset position point to each two adjacent microphones according to the coordinate information of the preset position point and the coordinate information of each microphone;
  • the voice power value corresponding to the location point is calculated according to the generalized cross-correlation of the preset position point to every two adjacent microphones.
  • the step of calculating a delay difference between the preset position point and each two adjacent microphones according to the coordinate information of the preset position point and the coordinate information of each microphone for each preset position point include:
  • the D mk is the distance from the preset position point m to the microphone k
  • the D ml is the distance from the preset position point m to the microphone l
  • the c is the sound speed.
  • calculating the preset position point to every two according to a Fourier transform of each of the voice signals, and a delay difference between the preset position point and each two adjacent microphones The steps of generalized cross-correlation of adjacent microphones include:
  • the M k (w) is a Fourier transform of the speech signal received by the microphone k; a conjugate of the Fourier transform of the speech signal received by the microphone 1; the w is the speech signal frequency; the ⁇ kl (w) is determined by the following formula:
  • the step of calculating a voice power value corresponding to the location point according to the generalized cross-correlation of the preset location point to each two adjacent microphones includes:
  • the M is the total number of microphones.
  • the method further includes:
  • the target focal length corresponding to the target distance is identified according to the correspondence between the distances and the focal lengths stored in advance, and the focal length of the target is adjusted to be the target focal length.
  • the step of determining the height of the human body at the location of the sound source comprises:
  • An image of the human body at the location of the sound source is acquired, and the image is analyzed to obtain the height of the human body at the sound source location.
  • the embodiment of the present application provides a device control device, which is applied to a video capture device in a sound source localization system, where the sound source localization system further includes a microphone disposed outside the video capture device, the device include:
  • An acquiring module configured to acquire a voice signal collected by each microphone, and acquire coordinate information of each microphone and coordinate information of each preset position point;
  • a first calculation module configured to calculate a voice power value corresponding to each location point according to the voice signal collected by each microphone, coordinate information of each microphone, and coordinate information of each preset location point;
  • a control module configured to identify a location point where the voice power value is the largest, determine the location point as a sound source location, and control the self to align the sound source location.
  • the calculating module includes:
  • a first calculation submodule configured to calculate a Fourier transform of each speech signal according to the voice signal collected by each microphone
  • a second calculation sub-module configured, for each preset position point, calculating a delay difference between the preset position point and each two adjacent microphones according to coordinate information of the preset position point and coordinate information of each microphone ;
  • a third calculation sub-module configured to calculate the preset position point to every two according to a Fourier transform of each voice signal, and a delay difference between the preset position point and each two adjacent microphones Generalized cross-correlation of adjacent microphones;
  • a fourth calculating submodule configured to calculate a voice power value corresponding to the location point according to the generalized cross-correlation of the preset position point to each two adjacent microphones.
  • the second calculating sub-module is specifically configured to calculate a delay difference ⁇ mkl of any preset position point m to any two adjacent microphones k, l according to the following formula:
  • the D mk is the distance from the preset position point m to the microphone k
  • the D ml is the distance from the preset position point m to the microphone l
  • the c is the sound speed.
  • the third calculating sub-module is specifically configured to calculate a generalized cross-correlation R( ⁇ mkl ) of any one of the preset position points m to the two adjacent microphones k, l according to the following formula :
  • the M k (w) is a Fourier transform of the speech signal received by the microphone k; a conjugate of the Fourier transform of the speech signal received by the microphone 1; the w is the speech signal frequency; the ⁇ kl (w) is determined by the following formula:
  • the fourth calculating submodule is specifically configured to calculate a voice power value P(m) corresponding to any preset position point m according to the following formula:
  • the M is the total number of microphones.
  • the device further includes:
  • a second calculating module configured to calculate, according to the coordinate information of the video collection device, the height of the video collection device, the coordinate information of the sound source location, and the height of the human body at the sound source location, The target distance of the human body;
  • an adjustment module configured to identify a target focal length corresponding to the target distance according to a preset relationship between each distance and a focal length, and adjust a focal length of the target to be the target focal length.
  • the determining module is specifically configured to:
  • An image of the human body at the location of the sound source is acquired, and the image is analyzed to obtain the height of the human body at the sound source location.
  • the embodiment of the present application provides a device control system, where the system includes: a video capture device, and a microphone disposed outside the video capture device;
  • the video capture device is configured to acquire a voice signal collected by each microphone, and obtain coordinate information of each microphone, and coordinate information of each preset position point; and the voice signal collected by each microphone, the microphones Coordinate information, and coordinate information of each preset position point, calculate a voice power value corresponding to each position point; identify a position point where the voice power value is the largest, determine the position point as a sound source position, and control the self alignment Sound source location
  • Each of the microphones is configured to collect a voice signal, and send the collected voice signal to the video collection device.
  • the embodiment of the present application further provides a video collection device, including:
  • processor a memory, a communication interface, and a bus
  • the processor, the memory, and the communication interface are connected by the bus and complete communication with each other;
  • the memory stores executable program code
  • the processor runs a program corresponding to the executable program code by reading executable program code stored in the memory, for performing a device control method according to the first aspect of the present application at runtime .
  • the present application provides a storage medium, wherein the storage medium is configured to store executable program code for performing a device control according to the first aspect of the present application at runtime method.
  • the application provides an application, wherein the application is configured to perform a device control method according to the first aspect of the present application at runtime.
  • the video collection device may acquire the voice signal collected by each microphone, acquire coordinate information of each microphone, and coordinate information of each preset position point, and then, according to the voice signal collected by each microphone, coordinate information of each microphone. And the coordinate information of each preset position point, calculate the voice power value corresponding to each position point, finally identify the position point with the largest voice power value, determine the position point as the sound source position, and control the self to align the sound source position.
  • the microphone is disposed outside the video collection device, and the sound in the collection scene can be collected by the microphone. Therefore, the accuracy of the sound source positioning can be improved, thereby improving the accuracy of the device control.
  • FIG. 1 is a flowchart of a device control method according to an embodiment of the present application
  • FIG. 2(a) is a schematic diagram showing the distribution of microphones in a classroom according to an embodiment of the present application
  • FIG. 2(b) is a schematic diagram showing the distribution of microphones in another classroom in the embodiment of the present application.
  • FIG. 3 is another flowchart of a device control method according to an embodiment of the present disclosure
  • FIG. 4 is a schematic diagram of a three-dimensional coordinate system according to an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a device control apparatus according to an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a device control system according to an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a video collection device according to an embodiment of the present disclosure.
  • the embodiment of the present application provides a device control method, device, and system.
  • the embodiment of the present application provides a device control method process. As shown in FIG. 1 , the process may include the following steps:
  • a video capture device in order to improve the accuracy of the sound source localization, may be installed in a collection scene where sound source localization is required, such as a classroom or a conference room, and a microphone is installed outside the video capture device.
  • the video capture device may be a ball machine, a camera, or the like, which is not limited in this embodiment of the present application.
  • a plurality of microphones may be installed, and in order to receive voice signals of various areas of the scene, such as a classroom, a conference room, etc., a plurality of microphones may be installed in various areas of the collection scene.
  • FIG. 2(a) shows a schematic diagram of a microphone distribution in a classroom in the embodiment of the present application, which is a classroom top view.
  • the video capture device 210 can be installed near the podium, and a plurality of microphones can be mounted on the walls around the classroom.
  • the microphones installed on each wall can be arranged at equal intervals, or they can be arranged at different intervals according to the distribution of the students 230 in the classroom.
  • FIG. 2(b) shows a schematic diagram of the distribution of microphones in another classroom in the embodiment of the present application, which is a classroom top view.
  • the video capture device 210 can be installed near the podium, and a plurality of microphones can be mounted on the classroom roof.
  • the microphones of the roof may be arranged at equal intervals, or may be arranged at different intervals according to the distribution of the students 230 in the classroom.
  • the identification information of each microphone and each preset position point may also be determined.
  • the identification information of each microphone may be a, b, l, k, etc.
  • the identification information of each preset position point may be 1, 2, 3 Wait. It is also possible to construct a plane coordinate system in the acquisition scene and determine it according to the constructed plane coordinate system. Coordinate information of each microphone and each preset position point.
  • Each of the above preset position points is a point of each position that may be present by the person speaking.
  • the direction in which a certain wall along the wall intersects the ground can be determined as the X-axis, and the direction in which the other wall adjacent to the wall intersects the ground is determined.
  • the Y axis the point where the X axis intersects the Y axis is the O point.
  • the coordinate information of each microphone can be determined.
  • the coordinate information of the microphone n can be: (Xn, Yn).
  • the coordinate information of each microphone can be stored in the video capture device.
  • the coordinate information of each microphone stored in the video capture device can be as shown in Table 1:
  • Microphone identification information Coordinate information n (1.0, 0.2) k (3.0, 0.2) l (5.0, 0.2) t (7.0, 0.2)
  • the person speaking may be a teacher or a student, and the teacher is usually on the podium, and the student is usually in his or her seat. That is to say, the position points that the person who speaks may appear are the podium or the seats of each student. Therefore, after determining the X axis and the Y axis, coordinate information of the podium and coordinate information of each student seat can also be determined.
  • the coordinate information of the i th preset position point can be determined as: (Xi, Yi).
  • the coordinate information of the podium and the coordinate information of each student seat may be stored in the video collection device as coordinate information of each preset position point.
  • the coordinate information of each preset position point stored in the video capture device can be as shown in Table 2:
  • the voice signal collected by the microphone can be transmitted to the video collection device through an analog circuit. Therefore, in the embodiment of the present application, the video collection device can acquire the voice signal collected by each microphone. Moreover, in order to perform sound source localization, the video capture device may also acquire coordinate information of each microphone and coordinate information of each preset position point. For example, the video capture device may acquire coordinate information of each microphone and coordinate information of each preset position point from a local predetermined storage space.
  • the video collection device may obtain the voice signal collected by each microphone and the coordinate information of each microphone. And the coordinate information of each preset position point, and calculate the voice power value corresponding to each position point to determine the position point where the sound source is located.
  • the voice power value corresponding to each location point can be used to characterize the size of the sound at each location. It can be understood that the position point where the voice power value is the largest, that is, the position point where the sound is the largest, is the sound source position point. Specifically, the video collection device may calculate a voice power value corresponding to the location point for each preset location point.
  • the video capture device can calculate the Fourier transform of each voice signal according to the voice signal collected by each microphone. Specifically, the video capture device can calculate the Fourier transform of each voice signal according to the existing manner, which is not described in detail in this embodiment of the present application.
  • the video capture device may first calculate the delay difference between the preset position point and each two adjacent microphones according to the coordinate information of the preset position point and the coordinate information of each microphone. Specifically, the delay difference ⁇ mkl of any preset position point m to any two adjacent microphones k, l can be calculated according to the following formula:
  • D mk is the distance from the preset position point m to the microphone k
  • D ml is the distance from the preset position point m to the microphone l
  • c is the speed of sound
  • the video capture device can calculate the generalized mutual value of the preset position point to each two adjacent microphones according to the Fourier transform of each voice signal and the delay difference between the preset position point and each two adjacent microphones. Related.
  • the video capture device may calculate a generalized cross-correlation R( ⁇ mkl ) of the preset position point m to the adjacent microphones k, l according to the following formula:
  • M k (w) is the Fourier transform of the speech signal received by the microphone k;
  • w is the speech signal frequency;
  • ⁇ kl (w) is determined by the following formula:
  • the video capture device may calculate a voice power value corresponding to the location point according to the generalized cross-correlation of the preset position point to each two adjacent microphones. Specifically, the video collection device may calculate a voice power value P(m) corresponding to the preset location point m according to the following formula:
  • M is the total number of microphones.
  • the video capture device After the video capture device calculates the voice power value corresponding to each preset location point, it can identify the location point where the voice power value is the largest, and determine the location point as the sound source location.
  • the location of the sound source is the location of the person who is speaking. Compared with other locations, the sound of the location should be the largest, so the voice power value corresponding to the location should also be the largest. Therefore, in the embodiment of the present application, the location point where the voice power value is the largest can be determined as the sound source location.
  • the video capture device can control itself to align the source location for video recording of the person who is speaking. For example, a video capture device can control its lens orientation to align it with the sound source location.
  • the process of the video capture device controlling the direction of the lens to be aligned with the position of the sound source may be any one of the existing methods. This embodiment of the present application does not describe the process.
  • the embodiment of the present application provides a device control method, where a video capture device can acquire a voice signal collected by each microphone, acquire coordinate information of each microphone, coordinate information of each preset position point, and then, according to the voice signal collected by each microphone, The coordinate information of each microphone and the coordinate information of each preset position point, calculate the voice power value corresponding to each position point, and finally identify the position point where the voice power value is the largest, determine the position point as the sound source position, and control itself. Align the sound source location.
  • the microphone is disposed outside the video collection device, and the sound in the collection scene can be collected by the microphone. Therefore, the accuracy of the sound source positioning can be improved, thereby improving the accuracy of the device control.
  • the video capture device usually needs to adjust the focal length for the target object at different distances to achieve better recording results. For example, when recording a target object that is close to the distance, a smaller focal length can be used to include the entire target object in the recorded image; when recording a distant object, a larger focal length can be used to enlarge the target object. , to ensure the clarity of the target object in the recorded image.
  • the video capture device can adjust the focal length for different sound source positions to achieve a better recording effect.
  • the device control method provided by the embodiment of the present application may further include:
  • the location of the sound source determined by the video capture device can only determine the location of the sound source. Two-dimensional coordinates. In order to accurately determine the precise distance between the video capture device and the human body at the sound source location, the video capture device can determine the height of the human body at the sound source location after controlling the position of the sound source itself.
  • the video capture device may pre-store the average height of the human body, such as 1.68 meters, 1.70 meters, 1.72 meters, and the like.
  • the average height of the human body stored in advance is obtained, and the average height of the acquired human body is determined as the height of the human body at the position of the sound source.
  • the video capture device may collect an image of the human body at the location of the sound source, and analyze the image to obtain the height of the human body at the sound source location.
  • the process of collecting an image by the video capture device and the process of analyzing the image may be performed by any of the existing methods. This embodiment of the present application does not describe the process.
  • S302 Calculate a target distance from itself to the human body according to its own coordinate information, its own height, coordinate information of the sound source position, and the height of the human body at the sound source position.
  • the video capture device determines the height of the human body at the sound source position, it can further calculate from its own to the human body according to its own coordinate information, its own height, the coordinate information of the sound source position, and the height of the human body at the sound source position. Target distance.
  • the video capture device can determine its own three-dimensional coordinates according to its own coordinate information and its own height, and determine the three-dimensional coordinates of the human body according to the coordinate information of the sound source position and the height of the human body at the sound source position, and then can be determined according to The three-dimensional coordinates of the body and the three-dimensional coordinates of the human body determine the distance from the body to the human body.
  • the target distance l from the video capture device to the human head is:
  • the video capture device may pre-store the correspondence between each distance and the focal length.
  • the user can determine the correspondence between each distance and the focal length based on the empirical value, and save In the video capture device.
  • the correspondence between each distance and focal length saved in the video capture device can be as shown in Table 3:
  • the video capture device After the video capture device determines the target distance from itself to the human body, it can identify the target focal length corresponding to the target distance according to the pre-stored correspondence between each distance and the focal length, and adjust its own focal length to the target focal length.
  • the video capture device may identify the same distance from the target distance in the correspondence between the distances and the focal lengths that it saves, and determine the focal length corresponding to the distance as the target focal length corresponding to the target distance.
  • the process of adjusting the focal length of the video capture device may be performed by any of the existing methods. This embodiment of the present application does not describe the process.
  • the target distance calculated by the video capture device may be a decimal, and the distance between the distance and the focal length saved is not the same as the target distance.
  • the video capture device can identify the distance corresponding to the integer part of the target distance in the corresponding relationship between the distance and the focal length, and determine the focal length corresponding to the distance as the target focal length corresponding to the target distance.
  • the video capture device can adjust the focal length according to the actual distance from the human body at the sound source location, and thus, the video recording effect can be improved.
  • the embodiment of the present application also provides a corresponding device embodiment.
  • the device includes:
  • the obtaining module 510 is configured to acquire a voice signal collected by each microphone, and acquire coordinate information of each microphone and coordinate information of each preset position point;
  • the first calculation module 520 is configured to calculate a voice power value corresponding to each location point according to the voice signal collected by each microphone, coordinate information of each microphone, and coordinate information of each preset location point;
  • the control module 530 is configured to identify a location point where the voice power value is the largest, determine the location point as the sound source location, and control the self to align the sound source location.
  • An embodiment of the present application provides a device control apparatus, where a video capture device can acquire a voice signal collected by each microphone, acquire coordinate information of each microphone, coordinate information of each preset position point, and then, according to the voice signal collected by each microphone, The coordinate information of each microphone and the coordinate information of each preset position point, calculate the voice power value corresponding to each position point, and finally identify the position point where the voice power value is the largest, determine the position point as the sound source position, and control itself. Align the sound source location.
  • the microphone is disposed outside the video collection device, and the sound in the collection scene can be collected by the microphone. Therefore, the accuracy of the sound source positioning can be improved, thereby improving the accuracy of the device control.
  • the first calculating module 520 includes:
  • a first calculation sub-module (not shown) for calculating a Fourier transform of each speech signal according to the voice signal collected by each microphone;
  • a second calculation sub-module (not shown) for calculating, according to the coordinate information of the preset position point and the coordinate information of each microphone, the preset position point to each of the two preset position points The delay difference of adjacent microphones;
  • a third calculation sub-module (not shown) for calculating the pre-determination according to a Fourier transform of each speech signal and a delay difference between the preset position point and each two adjacent microphones Set the positional point to the generalized cross-correlation of every two adjacent microphones;
  • the fourth calculation sub-module (not shown) is configured to calculate a voice power value corresponding to the location point according to the generalized cross-correlation of the preset position point to each two adjacent microphones.
  • the second calculating sub-module is specifically configured to calculate a delay difference ⁇ mkl of any preset position point m to any two adjacent microphones k and l according to the following formula: :
  • the D mk is the distance from the preset position point m to the microphone k
  • the D ml is the distance from the preset position point m to the microphone l
  • the c is the sound speed.
  • the third calculating sub-module is specifically configured to calculate, according to the following formula, any of the preset position points m to the generalized of the two adjacent microphones k, l Cross-correlation R( ⁇ mkl ):
  • the M k (w) is a Fourier transform of the speech signal received by the microphone k; a conjugate of the Fourier transform of the speech signal received by the microphone 1; the w is the speech signal frequency; the ⁇ kl (w) is determined by the following formula:
  • the fourth calculating sub-module is specifically configured to calculate a voice power value P(m) corresponding to any preset position point m according to the following formula:
  • the M is the total number of microphones.
  • the device further includes:
  • a second calculation module for determining coordinate information of the video capture device, height of the video capture device, coordinate information of the sound source location, and a human body at the sound source location Height, calculating the target distance from itself to the human body;
  • An adjustment module (not shown) is configured to identify a target focal length corresponding to the target distance according to a correspondence between each distance and a focal length saved in advance, and adjust a focal length of the target to be the target focal length.
  • the determining module is specifically configured to:
  • An image of the human body at the location of the sound source is acquired, and the image is analyzed to obtain the height of the human body at the sound source location.
  • the embodiment of the present application further provides a device control system, where the system includes: a video capture device 610, and a microphone 620 disposed outside the video capture device 610;
  • the video capture device 610 is configured to acquire a voice signal collected by each microphone 620, and acquire coordinate information of each microphone 620 and coordinate information of each preset position point; according to the voice signal collected by each microphone 620, The coordinate information of each microphone 620 and the coordinate information of each preset position point calculate a voice power value corresponding to each position point; identify a position point where the voice power value is the largest, determine the position point as a sound source position, and control Self-aligning the sound source position;
  • Each of the microphones 620 is configured to collect voice signals and send the collected voice signals to the video collection device 610.
  • An embodiment of the present application provides a device control system, where a video capture device can acquire a voice signal collected by each microphone, acquire coordinate information of each microphone, coordinate information of each preset position point, and then, according to the voice signal collected by each microphone, The coordinate information of each microphone and the coordinate information of each preset position point, calculate the voice power value corresponding to each position point, and finally identify the position point where the voice power value is the largest, determine the position point as the sound source position, and control itself. Align the sound source location.
  • the microphone is disposed outside the video collection device, and the sound in the collection scene can be collected by the microphone. Therefore, the accuracy of the sound source positioning can be improved, thereby improving the accuracy of the device control.
  • the embodiment of the present application further provides a video collection device, which may include:
  • processor 710 a processor 710, a memory 720, a communication interface 730, and a bus 740;
  • the processor 710, the memory 720, and the communication interface 730 are connected by the bus 740 and complete communication with each other;
  • the memory 720 stores executable program code
  • the processor 710 runs a program corresponding to the executable program code by reading executable program code stored in the memory 720, for performing a device control according to an embodiment of the present application at runtime.
  • the method wherein the method comprises:
  • a location point where the voice power value is the largest is identified, the location point is determined as the sound source location, and the self is aligned to the sound source location.
  • the video collection device may acquire the voice signal collected by each microphone, acquire coordinate information of each microphone, and coordinate information of each preset position point, and then, according to the voice signal collected by each microphone, coordinate information of each microphone. And the coordinate information of each preset position point, calculate the voice power value corresponding to each position point, finally identify the position point with the largest voice power value, determine the position point as the sound source position, and control the self to align the sound source position.
  • the microphone is disposed outside the video collection device, and the sound in the collection scene can be collected by the microphone. Therefore, the accuracy of the sound source positioning can be improved, thereby improving the accuracy of the device control.
  • the embodiment of the present application further provides a storage medium, where the storage medium is used to store executable program code, and the executable program code is used to execute a device according to an embodiment of the present application at runtime.
  • the control method wherein the device control method is applied to a video capture device in a sound source localization system, the sound source localization system further includes a microphone disposed outside the video capture device, and the method includes:
  • a location point where the voice power value is the largest is identified, the location point is determined as the sound source location, and the self is aligned to the sound source location.
  • the video collection device may acquire the voice signal collected by each microphone, acquire coordinate information of each microphone, and coordinate information of each preset position point, and then, according to the voice signal collected by each microphone, coordinate information of each microphone. And the coordinate information of each preset position point, calculate the voice power value corresponding to each position point, finally identify the position point with the largest voice power value, determine the position point as the sound source position, and control the self to align the sound source position.
  • the microphone is disposed outside the video collection device, and the sound in the collection scene can be collected by the microphone. Therefore, the accuracy of the sound source positioning can be improved, thereby improving the accuracy of the device control.
  • the embodiment of the present application further provides an application program, where the application is used to execute a device control method according to an embodiment of the present application at runtime, where the device control method is applied to a sound source.
  • a video capture device in the positioning system the sound source localization system further includes a microphone disposed outside the video capture device, the method comprising:
  • a location point where the voice power value is the largest is identified, the location point is determined as the sound source location, and the self is aligned to the sound source location.
  • the video collection device may acquire the voice signal collected by each microphone, acquire coordinate information of each microphone, and coordinate information of each preset position point, and then, according to the voice signal collected by each microphone, coordinate information of each microphone. And the coordinate information of each preset position point, calculate the voice power value corresponding to each position point, finally identify the position point with the largest voice power value, determine the position point as the sound source position, and control the self to align the sound source position.
  • the microphone is disposed outside the video collection device, and the sound in the collection scene can be collected by the microphone. Therefore, the accuracy of the sound source positioning can be improved, thereby improving the accuracy of the device control.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Health & Medical Sciences (AREA)
  • Otolaryngology (AREA)
  • Human Computer Interaction (AREA)
  • Electromagnetism (AREA)
  • Acoustics & Sound (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)

Abstract

一种设备控制方法、装置及系统,方法应用于声源定位系统中的视频采集设备(610),该声源定位系统还包括设置于视频采集设备(610)外部的麦克风(620),方法包括:获取各麦克风(620)采集的语音信号,并获取各麦克风(620)的坐标信息,以及各预设位置点的坐标信息(S101);根据各麦克风(620)采集的语音信号、各麦克风(620)的坐标信息、以及各预设位置点的坐标信息,计算各位置点对应的语音功率值(S102);识别语音功率值最大的位置点,将该位置点确定为声源位置,并控制自身对准声源位置(S103),将麦克风(620)设置在视频采集设备(610)外部,采集场景中的声音均能被麦克风(620)采集到,因此,能够提高声源定位的准确性,从而提高设备控制的准确性。

Description

一种设备控制方法、装置及系统
本申请要求于2016年11月23日提交中国专利局、申请号为201611047345.8发明名称为“一种设备控制方法、装置及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及自动控制技术领域,特别是涉及一种设备控制方法、装置及系统。
背景技术
随着通信技术的发展,信息交互变得越来越方便。例如,在办公领域,处于不同地区的用户可以进行视频会议;或者,在教育领域,可以通过直播的方式进行在线教学,或通过录播视频的方式进行教学等。
在视频会议、在线教学或视频录播过程中,通常需要定位正在讲话的人,并控制视频采集设备对准正在讲话的人。具体地,可以在会议室或教室安装视频采集设备,如,可以为能够360度旋转及上下调节的云台摄像机,通过该视频采集设备定位声源位置,即为正在讲话的人所处位置,进一步地控制其对准正在讲话的人。
已知的设备控制方法,主要为将麦克风安装在视频采集设备上,通过对麦克风接收的声源进行检测,从而定位声源位置,即为正在讲话的人所处位置,进而控制视频采集设备对准声源位置。但是,上述方法中,由于将麦克风安装在视频采集设备上,因此通常只能定位距离视频采集设备距离较近的声源,而不能准确定位距离视频采集设备距离较远的声源。因此,上述方法中,声源定位准确性较低,从而导致设备控制准确性较低。
发明内容
本申请实施例的目的在于提供一种设备控制方法、装置及系统,以提高设备控制的准确性。具体技术方案如下:
第一方面,本申请实施例提供了一种设备控制方法,应用于声源定位系 统中的视频采集设备,所述声源定位系统还包括设置于所述视频采集设备外部的麦克风,所述方法包括:
获取各麦克风采集的语音信号,并获取所述各麦克风的坐标信息,以及各预设位置点的坐标信息;
根据所述各麦克风采集的语音信号、所述各麦克风的坐标信息、以及各预设位置点的坐标信息,计算各位置点对应的语音功率值;
识别语音功率值最大的位置点,将该位置点确定为声源位置,并控制自身对准所述声源位置。
可选地,所述根据所述各麦克风采集的语音信号、所述各麦克风的坐标信息、以及各预设位置点的坐标信息,计算各位置点对应的语音功率值的步骤包括:
根据各麦克风采集的语音信号,计算各语音信号的傅里叶变换;
针对每个预设位置点,根据该预设位置点的坐标信息、以及各麦克风的坐标信息,计算该预设位置点到每两个相邻麦克风的时延差;
根据各语音信号的傅里叶变换,以及所述该预设位置点到每两个相邻麦克风的时延差,计算所述该预设位置点到每两个相邻麦克风的广义互相关;
根据该预设位置点到每两个相邻麦克风的广义互相关,计算该位置点对应的语音功率值。
可选地,所述针对每个预设位置点,根据该预设位置点的坐标信息、以及各麦克风的坐标信息,计算该预设位置点到每两个相邻麦克风的时延差的步骤包括:
根据以下公式,计算任一预设位置点m到任两个相邻麦克风k、l的时延差τmkl
Figure PCTCN2017106800-appb-000001
其中,所述Dmk为所述预设位置点m到麦克风k的距离,所述Dml为所述预 设位置点m到麦克风l的距离,所述c为声速。
可选地,所述根据所述各语音信号的傅里叶变换,以及所述该预设位置点到每两个相邻麦克风的时延差,计算所述该预设位置点到每两个相邻麦克风的广义互相关的步骤包括:
根据以下公式,计算所述任一预设位置点m到所述任两个相邻麦克风k、l的广义互相关R(τmkl):
Figure PCTCN2017106800-appb-000002
其中,所述Mk(w)为所述麦克风k接收的语音信号的傅里叶变换;所述
Figure PCTCN2017106800-appb-000003
为所述麦克风l接收的语音信号的傅里叶变换的共轭;所述w为语音信号频率;所述φkl(w)通过以下公式确定:
Figure PCTCN2017106800-appb-000004
可选地,所述根据该预设位置点到每两个相邻麦克风的广义互相关,计算该位置点对应的语音功率值的步骤包括:
根据以下公式,计算所述任一预设位置点m对应的语音功率值P(m):
Figure PCTCN2017106800-appb-000005
其中,所述M为麦克风总数。
可选地,所述方法还包括:
确定所述声源位置处的人体的身高;
根据自身的坐标信息、自身的高度、所述声源位置的坐标信息、以及所述声源位置处的人体的身高,计算从自身到所述人体的目标距离;
根据预先保存的各距离与焦距的对应关系,识别与所述目标距离对应的目标焦距,并调节自身的焦距为所述目标焦距。
可选地,所述确定所述声源位置处的人体的身高的步骤包括:
获取预先保存的人体平均身高,并将所获取的人体平均身高确定为所述声源位置处的人体的身高;或
采集包含所述声源位置处的人体的图像,对所述图像进行分析,得到所述声源位置处的人体的身高。
第二方面,本申请实施例提供了一种设备控制装置,应用于声源定位系统中的视频采集设备,所述声源定位系统还包括设置于所述视频采集设备外部的麦克风,所述装置包括:
获取模块,用于获取各麦克风采集的语音信号,并获取所述各麦克风的坐标信息,以及各预设位置点的坐标信息;
第一计算模块,用于根据所述各麦克风采集的语音信号、所述各麦克风的坐标信息、以及各预设位置点的坐标信息,计算各位置点对应的语音功率值;
控制模块,用于识别语音功率值最大的位置点,将该位置点确定为声源位置,并控制自身对准所述声源位置。
可选地,所述计算模块,包括:
第一计算子模块,用于根据各麦克风采集的语音信号,计算各语音信号的傅里叶变换;
第二计算子模块,用于针对每个预设位置点,根据该预设位置点的坐标信息、以及各麦克风的坐标信息,计算该预设位置点到每两个相邻麦克风的时延差;
第三计算子模块,用于根据各语音信号的傅里叶变换,以及所述该预设位置点到每两个相邻麦克风的时延差,计算所述该预设位置点到每两个相邻麦克风的广义互相关;
第四计算子模块,用于根据该预设位置点到每两个相邻麦克风的广义互相关,计算该位置点对应的语音功率值。
可选地,所述第二计算子模块,具体用于根据以下公式,计算任一预设位置点m到任两个相邻麦克风k、l的时延差τmkl
Figure PCTCN2017106800-appb-000006
其中,所述Dmk为所述预设位置点m到麦克风k的距离,所述Dml为所述预设位置点m到麦克风l的距离,所述c为声速。
可选地,所述第三计算子模块,具体用于根据以下公式,计算所述任一预设位置点m到所述任两个相邻麦克风k、l的广义互相关R(τmkl):
Figure PCTCN2017106800-appb-000007
其中,所述Mk(w)为所述麦克风k接收的语音信号的傅里叶变换;所述
Figure PCTCN2017106800-appb-000008
为所述麦克风l接收的语音信号的傅里叶变换的共轭;所述w为语音信号频率;所述φkl(w)通过以下公式确定:
Figure PCTCN2017106800-appb-000009
可选地,所述第四计算子模块,具体用于根据以下公式,计算所述任一预设位置点m对应的语音功率值P(m):
Figure PCTCN2017106800-appb-000010
其中,所述M为麦克风总数。
可选地,所述装置还包括:
确定模块,用于确定所述声源位置处的人体的身高;
第二计算模块,用于根据所述视频采集设备的坐标信息、所述视频采集设备的高度、所述声源位置的坐标信息、以及所述声源位置处的人体的身高,计算从自身到所述人体的目标距离;
调节模块,用于根据预先保存的各距离与焦距的对应关系,识别与所述目标距离对应的目标焦距,并调节自身的焦距为所述目标焦距。
可选地,所述确定模块,具体用于:
获取预先保存的人体平均身高,并将所获取的人体平均身高确定为所述声源位置处的人体的身高;或
采集包含所述声源位置处的人体的图像,对所述图像进行分析,得到所述声源位置处的人体的身高。
第三方面,本申请实施例提供了一种设备控制系统,所述系统包括:视频采集设备、以及设置于所述视频采集设备外部的麦克风;
所述视频采集设备,用于获取各麦克风采集的语音信号,并获取所述各麦克风的坐标信息,以及各预设位置点的坐标信息;根据所述各麦克风采集的语音信号、所述各麦克风的坐标信息、以及各预设位置点的坐标信息,计算各位置点对应的语音功率值;识别语音功率值最大的位置点,将该位置点确定为声源位置,并控制自身对准所述声源位置;
所述各麦克风,用于采集语音信号,并将其采集的语音信号发送给所述视频采集设备。
第四方面,本申请实施例还提供了一种视频采集设备,包括:
处理器、存储器、通信接口和总线;
所述处理器、所述存储器和所述通信接口通过所述总线连接并完成相互间的通信;
所述存储器存储可执行程序代码;
所述处理器通过读取所述存储器中存储的可执行程序代码来运行与所述可执行程序代码对应的程序,以用于在运行时执行本申请第一方面所述的一种设备控制方法。
第五方面,本申请提供了一种存储介质,其中,该存储介质用于存储可执行程序代码,所述可执行程序代码用于在运行时执行本申请第一方面所述的一种设备控制方法。
第六方面,本申请提供了一种应用程序,其中,该应用程序用于在运行时执行本申请第一方面所述的一种设备控制方法。
本申请实施例中,视频采集设备可以获取各麦克风采集的语音信号,并获取各麦克风的坐标信息,以及各预设位置点的坐标信息,然后根据各麦克风采集的语音信号、各麦克风的坐标信息、以及各预设位置点的坐标信息,计算各位置点对应的语音功率值,最后识别语音功率值最大的位置点,将该位置点确定为声源位置,并控制自身对准所述声源位置。本申请实施例中,将麦克风设置在视频采集设备外部,采集场景中的声音均能被麦克风采集到,因此,能够提高声源定位的准确性,从而提高设备控制的准确性。
附图说明
为了更清楚地说明本申请实施例和现有技术的技术方案,下面对实施例和现有技术中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的一种设备控制方法的流程图;
图2(a)为本申请实施例的一种教室中麦克风分布情况示意图;
图2(b)为本申请实施例的另一种教室中麦克风分布情况示意图;
图3为本申请实施例提供的一种设备控制方法的另一流程图;
图4为本申请实施例的一种三维坐标系示意图;
图5为本申请实施例提供的一种设备控制装置的结构示意图;
图6为本申请实施例提供的一种设备控制系统的结构示意图;
图7为本申请实施例提供的一种视频采集设备的结构示意图。
具体实施方式
为了提高声源定位的准确性,从而提高设备控制的准确性,本申请实施例提供了一种设备控制方法、装置及系统。
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行 清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。
本申请实施例提供了一种设备控制方法过程,如图1所示,该过程可以包括以下步骤:
S101,获取各麦克风采集的语音信号,并获取所述各麦克风的坐标信息,以及各预设位置点的坐标信息。
在本申请实施例中,为了提高声源定位的准确性,可以在需要进行声源定位的采集场景中,如教室、会议室等,安装视频采集设备,并在视频采集设备外部安装麦克风。其中,上述视频采集设备可以为球机、摄像机等,本申请实施例对此不进行限定。
具体地,可以安装多个麦克风,并且为了能够很好地接收教室、会议室等采集场景各个区域的语音信号,可以将多个麦克风安装在采集场景的各个区域中。例如,如图2(a)所示,其示出了本申请实施例的一种教室中麦克风分布情况示意图,图中为教室俯视图。如图2(a)所示,可以将视频采集设备210安装在讲台附近,将多个麦克风安装在教室四周的墙壁上。并且,每面墙壁上安装的麦克风可以等间距排列,或者,也可以根据教室中学生230的分布情况不等间距排列。
或者,如图2(b)所示,其示出了本申请实施例的另一种教室中麦克风分布情况示意图,图中为教室俯视图。如图2(b)所示,可以将视频采集设备210安装在讲台附近,将多个麦克风安装在教室屋顶。并且,屋顶的各麦克风可以等间距排列,或者,也可以根据教室中学生230的分布情况不等间距排列。
并且,还可以确定各麦克风以及各预设位置点的标识信息,如,各麦克风的标识信息可以为a、b、l、k等,各预设位置点的标识信息可以为1、2、3等。还可以在采集场景中构建平面坐标系,并根据构建的平面坐标系,确定 各麦克风以及各预设位置点的坐标信息。上述各预设位置点即为讲话的人可能出现的各位置点。
例如,如图2(a)、2(b)所示,可以将沿教室某一墙壁与地面相交直线的方向确定为X轴,与该墙壁相邻的另一墙壁与地面相交直线的方向确定为Y轴,X轴与Y轴相交点为O点。确定X轴、Y轴后,即可确定各麦克风的坐标信息,如麦克风n的坐标信息可以为:(Xn、Yn)。并且,可以将各麦克风的坐标信息存储在视频采集设备中。视频采集设备中存储的各麦克风的坐标信息可以如表1所示:
表1
麦克风标识信息 坐标信息
n (1.0、0.2)
k (3.0、0.2)
l (5.0、0.2)
t (7.0、0.2)
在教室中,讲话的人可能为老师或学生,并且,教师通常在讲台上,学生通常在自己座位上。也就是说,讲话的人可能出现的各位置点即为讲台或各学生的座位。因此,当确定X轴、Y轴后,还可以确定讲台的坐标信息,以及各学生座位的坐标信息,如可以确定第i个预设位置点的坐标信息可以为:(Xi、Yi)。并且,可以将讲台的坐标信息,以及各学生座位的坐标信息作为各预设位置点的坐标信息存储在视频采集设备中。视频采集设备中存储的各预设位置点的坐标信息可以如表2所示:
表2
预设位置点标识信息 坐标信息
1 (4.0、7.8)
2 (2.0、6.2)
3 (4.0、6.2)
4 (6.0、6.2)
在本申请实施例中,麦克风采集的语音信号可以通过模拟电路传输给视频采集设备。因此,在本申请实施例中,视频采集设备可以获取各麦克风采集的语音信号。并且,为了进行声源定位,视频采集设备还可以获取各麦克风的坐标信息,以及各预设位置点的坐标信息。例如,视频采集设备可以从本地预定存储空间中获取各麦克风的坐标信息,以及各预设位置点的坐标信息。
S102,根据所述各麦克风采集的语音信号、所述各麦克风的坐标信息、以及各预设位置点的坐标信息,计算各位置点对应的语音功率值。
在本申请实施例中,获取到各麦克风采集的语音信号、各麦克风的坐标信息、以及各预设位置点的坐标信息后,视频采集设备可以根据各麦克风采集的语音信号、各麦克风的坐标信息、以及各预设位置点的坐标信息,计算各位置点对应的语音功率值,以确定声源所在的位置点。
各位置点对应的语音功率值,可以用来表征各位置点声音的大小。可以理解,语音功率值最大的位置点,也就是声音最大的位置点,即为声源位置点。具体地,视频采集设备可以分别针对每个预设位置点,计算该位置点对应的语音功率值。
例如,视频采集设备可以根据各麦克风采集的语音信号,计算各语音信号的傅里叶变换。具体地,视频采集设备可以根据现有的方式,计算各语音信号的傅里叶变换,本申请实施例对此不进行赘述。
针对任一预设位置点,视频采集设备可以首先根据该预设位置点的坐标信息、以及各麦克风的坐标信息,计算该预设位置点到每两个相邻麦克风的时延差。具体地,可以根据以下公式,计算任一预设位置点m到任两个相邻麦克风k、l的时延差τmkl
Figure PCTCN2017106800-appb-000011
其中,Dmk为预设位置点m到麦克风k的距离,Dml为预设位置点m到麦克风l的距离,c为声速,c=340m/s。
当预设位置点m的坐标信息为(Xm、Ym),麦克风k的坐标信息为(Xk、Yk),麦克风l的坐标信息为(Xl、Yl)时,Dmk、Dml分别为:
Figure PCTCN2017106800-appb-000012
Figure PCTCN2017106800-appb-000013
然后,视频采集设备可以根据各语音信号的傅里叶变换,以及该预设位置点到每两个相邻麦克风的时延差,计算该预设位置点到每两个相邻麦克风的广义互相关。
具体地,视频采集设备可以根据以下公式,计算预设位置点m到相邻麦克风k、l的广义互相关R(τmkl):
Figure PCTCN2017106800-appb-000014
其中,Mk(w)为麦克风k接收的语音信号的傅里叶变换;
Figure PCTCN2017106800-appb-000015
为麦克风l接收的语音信号的傅里叶变换的共轭;w为语音信号频率;φkl(w)通过以下公式确定:
Figure PCTCN2017106800-appb-000016
最后,视频采集设备可以根据该预设位置点到每两个相邻麦克风的广义互相关,计算该位置点对应的语音功率值。具体地,视频采集设备可以根据以下公式,计算预设位置点m对应的语音功率值P(m):
Figure PCTCN2017106800-appb-000017
其中,M为麦克风总数。
S103,识别语音功率值最大的位置点,将该位置点确定为声源位置,并控制自身对准所述声源位置。
当视频采集设备计算得到每个预设位置点对应的语音功率值后,其可以识别语音功率值最大的位置点,并将该位置点确定为声源位置。
声源位置即为正在讲话的人所在位置,与其他位置相比,该位置的声音应该是最大的,因此该位置对应的语音功率值也应该是最大的。因此,在本申请实施例中,可以将语音功率值最大的位置点确定为声源位置。
确定声源位置后,视频采集设备即可控制自身对准该声源位置,以对正在讲话的人进行视频录制。例如,视频采集设备可以控制其镜头方向,使其对准声源位置。视频采集设备控制其镜头方向,使其对准声源位置的过程,可以采用现有的任一种方法,本申请实施例对此过程不进行赘述。
本申请实施例提供了一种设备控制方法,视频采集设备可以获取各麦克风采集的语音信号,并获取各麦克风的坐标信息,以及各预设位置点的坐标信息,然后根据各麦克风采集的语音信号、各麦克风的坐标信息、以及各预设位置点的坐标信息,计算各位置点对应的语音功率值,最后识别语音功率值最大的位置点,将该位置点确定为声源位置,并控制自身对准所述声源位置。本申请实施例中,将麦克风设置在视频采集设备外部,采集场景中的声音均能被麦克风采集到,因此,能够提高声源定位的准确性,从而提高设备控制的准确性。
视频采集设备在视频录制过程中,针对不同距离的目标物体,其通常需要调节焦距,以达到更好的录制效果。例如,当录制距离较近的目标物体时,可以使用较小的焦距,以使录制图像中包括整个目标物体;当录制距离较远的目标物体时,可以使用较大的焦距,以放大目标物体,保证录制图像中目标物体的清晰度。
作为本申请实施例的一种实施方式,视频采集设备控制自身对准声源位置后,针对不同的声源位置,视频采集设备还可以调节其焦距,以达到更好的录制效果。如图3所示,本申请实施例提供的设备控制方法,还可以包括:
S301,确定所述声源位置处的人体的身高。
在本申请实施例中,视频采集设备确定的声源位置仅能确定声源位置的 二维坐标。为了准确地确定视频采集设备与声源位置处的人体的精确距离,视频采集设备控制自身对准声源位置后,还可以确定声源位置处的人体的身高。
在一种实现方式中,视频采集设备可以预先保存人体平均身高,如1.68米、1.70米、1.72米等。在确定声源位置处的人体的身高时,获取预先保存的人体平均身高,并将所获取的人体平均身高确定为声源位置处的人体的身高。
在另一种实现方式中,为了提高确定的人体身高的准确性,视频采集设备可以采集包含声源位置处的人体的图像,对该图像进行分析,得到声源位置处的人体的身高。视频采集设备采集图像的过程,以及对图像进行分析的过程,可以采用现有的任一种方法,本申请实施例对此过程不进行赘述。
S302,根据自身的坐标信息、自身的高度、所述声源位置的坐标信息、以及所述声源位置处的人体的身高,计算从自身到所述人体的目标距离。
视频采集设备确定声源位置处人体的身高后,其可以进一步地根据自身的坐标信息、自身的高度、声源位置的坐标信息、以及声源位置处的人体的身高,计算从自身到该人体的目标距离。
具体地,视频采集设备可以根据自身的坐标信息、自身的高度,确定自身的三维坐标,根据声源位置的坐标信息、以及声源位置处的人体的身高,确定人体的三维坐标,进而可以根据自身的三维坐标,以及人体的三维坐标,确定从自身到人体的距离。
例如,如图4所示的三维坐标系,当视频采集设备的三维坐标为(X0,Y0,Z0),人体的三维坐标为(X1,Y1,Z1)时,可以确定从视频采集设备到人体头部的目标距离l为:
Figure PCTCN2017106800-appb-000018
S303,根据预先保存的各距离与焦距的对应关系,识别与所述目标距离对应的目标焦距,并调节自身的焦距为所述目标焦距。
在本申请实施例中,视频采集设备可以预先保存各距离与焦距的对应关系。例如,可以由用户根据经验值,确定各距离与焦距的对应关系,并保存 在视频采集设备中。如,视频采集设备中保存的各距离与焦距的对应关系可以如表3所示:
表3
距离 焦距
1米 a1
2米 a2
3米 a3
4米 a4
当视频采集设备确定从自身到人体的目标距离后,其可以根据预先保存的各距离与焦距的对应关系,识别与目标距离对应的目标焦距,并调节自身的焦距为该目标焦距。
例如,视频采集设备可以在其保存的各距离与焦距的对应关系中,识别与目标距离相同的距离,并将该距离对应的焦距确定为与目标距离对应的目标焦距。视频采集设备调节焦距的过程,可以采用现有的任一种方法,本申请实施例对此过程不进行赘述。
有些情况下,视频采集设备计算得到的目标距离可能为小数,在其保存的各距离与焦距的对应关系中,识别不到与目标距离相同的距离。这种情况下,视频采集设备可以在其保存的各距离与焦距的对应关系中,识别与目标距离整数部分对应的距离,并将该距离对应的焦距确定为与目标距离对应的目标焦距。
本申请实施例中,视频采集设备可以根据其与声源位置处人体的实际距离,来调节其焦距,因此,能够提高视频录制效果。
相应于上面的方法实施例,本申请实施例还提供了相应的装置实施例。
图5为本申请实施例提供的一种设备控制装置,应用于声源定位系统中的视频采集设备,所述声源定位系统还包括设置于所述视频采集设备外部的麦 克风,所述装置包括:
获取模块510,用于获取各麦克风采集的语音信号,并获取所述各麦克风的坐标信息,以及各预设位置点的坐标信息;
第一计算模块520,用于根据所述各麦克风采集的语音信号、所述各麦克风的坐标信息、以及各预设位置点的坐标信息,计算各位置点对应的语音功率值;
控制模块530,用于识别语音功率值最大的位置点,将该位置点确定为声源位置,并控制自身对准所述声源位置。
本申请实施例提供了一种设备控制装置,视频采集设备可以获取各麦克风采集的语音信号,并获取各麦克风的坐标信息,以及各预设位置点的坐标信息,然后根据各麦克风采集的语音信号、各麦克风的坐标信息、以及各预设位置点的坐标信息,计算各位置点对应的语音功率值,最后识别语音功率值最大的位置点,将该位置点确定为声源位置,并控制自身对准所述声源位置。本申请实施例中,将麦克风设置在视频采集设备外部,采集场景中的声音均能被麦克风采集到,因此,能够提高声源定位的准确性,从而提高设备控制的准确性。
作为本申请实施例的一种实施方式,所述第一计算模块520,包括:
第一计算子模块(图中未示出),用于根据各麦克风采集的语音信号,计算各语音信号的傅里叶变换;
第二计算子模块(图中未示出),用于针对每个预设位置点,根据该预设位置点的坐标信息、以及各麦克风的坐标信息,计算该预设位置点到每两个相邻麦克风的时延差;
第三计算子模块(图中未示出),用于根据各语音信号的傅里叶变换,以及所述该预设位置点到每两个相邻麦克风的时延差,计算所述该预设位置点到每两个相邻麦克风的广义互相关;
第四计算子模块(图中未示出),用于根据该预设位置点到每两个相邻麦克风的广义互相关,计算该位置点对应的语音功率值。
作为本申请实施例的一种实施方式,所述第二计算子模块,具体用于根据以下公式,计算任一预设位置点m到任两个相邻麦克风k、l的时延差τmkl
Figure PCTCN2017106800-appb-000019
其中,所述Dmk为所述预设位置点m到麦克风k的距离,所述Dml为所述预设位置点m到麦克风l的距离,所述c为声速。
作为本申请实施例的一种实施方式,所述第三计算子模块,具体用于根据以下公式,计算所述任一预设位置点m到所述任两个相邻麦克风k、l的广义互相关R(τmkl):
Figure PCTCN2017106800-appb-000020
其中,所述Mk(w)为所述麦克风k接收的语音信号的傅里叶变换;所述
Figure PCTCN2017106800-appb-000021
为所述麦克风l接收的语音信号的傅里叶变换的共轭;所述w为语音信号频率;所述φkl(w)通过以下公式确定:
Figure PCTCN2017106800-appb-000022
作为本申请实施例的一种实施方式,所述第四计算子模块,具体用于根据以下公式,计算所述任一预设位置点m对应的语音功率值P(m):
Figure PCTCN2017106800-appb-000023
其中,所述M为麦克风总数。
作为本申请实施例的一种实施方式,所述装置还包括:
确定模块(图中未示出),用于确定所述声源位置处的人体的身高;
第二计算模块(图中未示出),用于根据所述视频采集设备的坐标信息、所述视频采集设备的高度、所述声源位置的坐标信息、以及所述声源位置处的人体的身高,计算从自身到所述人体的目标距离;
调节模块(图中未示出),用于根据预先保存的各距离与焦距的对应关系,识别与所述目标距离对应的目标焦距,并调节自身的焦距为所述目标焦距。
作为本申请实施例的一种实施方式,所述确定模块,具体用于:
获取预先保存的人体平均身高,并将所获取的人体平均身高确定为所述声源位置处的人体的身高;或
采集包含所述声源位置处的人体的图像,对所述图像进行分析,得到所述声源位置处的人体的身高。
如图6所示,本申请实施例还提供了一种设备控制系统,所述系统包括:视频采集设备610、以及设置于所述视频采集设备610外部的麦克风620;
所述视频采集设备610,用于获取各麦克风620采集的语音信号,并获取所述各麦克风620的坐标信息,以及各预设位置点的坐标信息;根据所述各麦克风620采集的语音信号、所述各麦克风620的坐标信息、以及各预设位置点的坐标信息,计算各位置点对应的语音功率值;识别语音功率值最大的位置点,将该位置点确定为声源位置,并控制自身对准所述声源位置;
所述各麦克风620,用于采集语音信号,并将其采集的语音信号发送给所述视频采集设备610。
本申请实施例提供了一种设备控制系统,视频采集设备可以获取各麦克风采集的语音信号,并获取各麦克风的坐标信息,以及各预设位置点的坐标信息,然后根据各麦克风采集的语音信号、各麦克风的坐标信息、以及各预设位置点的坐标信息,计算各位置点对应的语音功率值,最后识别语音功率值最大的位置点,将该位置点确定为声源位置,并控制自身对准所述声源位置。本申请实施例中,将麦克风设置在视频采集设备外部,采集场景中的声音均能被麦克风采集到,因此,能够提高声源定位的准确性,从而提高设备控制的准确性。
相应地,如图7所示,本申请实施例还提供了一种视频采集设备,可以包括:
处理器710、存储器720、通信接口730和总线740;
所述处理器710、所述存储器720和所述通信接口730通过所述总线740连接并完成相互间的通信;
所述存储器720存储可执行程序代码;
所述处理器710通过读取所述存储器720中存储的可执行程序代码来运行与所述可执行程序代码对应的程序,以用于在运行时执行本申请实施例所述的一种设备控制方法,其中,所述方法包括:
获取各麦克风采集的语音信号,并获取所述各麦克风的坐标信息,以及各预设位置点的坐标信息;
根据所述各麦克风采集的语音信号、所述各麦克风的坐标信息、以及各预设位置点的坐标信息,计算各位置点对应的语音功率值;
识别语音功率值最大的位置点,将该位置点确定为声源位置,并控制自身对准所述声源位置。
本申请实施例中,视频采集设备可以获取各麦克风采集的语音信号,并获取各麦克风的坐标信息,以及各预设位置点的坐标信息,然后根据各麦克风采集的语音信号、各麦克风的坐标信息、以及各预设位置点的坐标信息,计算各位置点对应的语音功率值,最后识别语音功率值最大的位置点,将该位置点确定为声源位置,并控制自身对准所述声源位置。本申请实施例中,将麦克风设置在视频采集设备外部,采集场景中的声音均能被麦克风采集到,因此,能够提高声源定位的准确性,从而提高设备控制的准确性。
相应地,本申请实施例还提供了一种存储介质,其中,该存储介质用于存储可执行程序代码,所述可执行程序代码用于在运行时执行本申请实施例所述的一种设备控制方法,其中,所述设备控制方法应用于声源定位系统中的视频采集设备,所述声源定位系统还包括设置于所述视频采集设备外部的麦克风,所述方法包括:
获取各麦克风采集的语音信号,并获取所述各麦克风的坐标信息,以及各预设位置点的坐标信息;
根据所述各麦克风采集的语音信号、所述各麦克风的坐标信息、以及各预设位置点的坐标信息,计算各位置点对应的语音功率值;
识别语音功率值最大的位置点,将该位置点确定为声源位置,并控制自身对准所述声源位置。
本申请实施例中,视频采集设备可以获取各麦克风采集的语音信号,并获取各麦克风的坐标信息,以及各预设位置点的坐标信息,然后根据各麦克风采集的语音信号、各麦克风的坐标信息、以及各预设位置点的坐标信息,计算各位置点对应的语音功率值,最后识别语音功率值最大的位置点,将该位置点确定为声源位置,并控制自身对准所述声源位置。本申请实施例中,将麦克风设置在视频采集设备外部,采集场景中的声音均能被麦克风采集到,因此,能够提高声源定位的准确性,从而提高设备控制的准确性。
相应地,本申请实施例还提供了一种应用程序,其中,该应用程序用于在运行时执行本申请实施例所述的一种设备控制方法,其中,所述设备控制方法应用于声源定位系统中的视频采集设备,所述声源定位系统还包括设置于所述视频采集设备外部的麦克风,所述方法包括:
获取各麦克风采集的语音信号,并获取所述各麦克风的坐标信息,以及各预设位置点的坐标信息;
根据所述各麦克风采集的语音信号、所述各麦克风的坐标信息、以及各预设位置点的坐标信息,计算各位置点对应的语音功率值;
识别语音功率值最大的位置点,将该位置点确定为声源位置,并控制自身对准所述声源位置。
本申请实施例中,视频采集设备可以获取各麦克风采集的语音信号,并获取各麦克风的坐标信息,以及各预设位置点的坐标信息,然后根据各麦克风采集的语音信号、各麦克风的坐标信息、以及各预设位置点的坐标信息,计算各位置点对应的语音功率值,最后识别语音功率值最大的位置点,将该位置点确定为声源位置,并控制自身对准所述声源位置。本申请实施例中,将麦克风设置在视频采集设备外部,采集场景中的声音均能被麦克风采集到,因此,能够提高声源定位的准确性,从而提高设备控制的准确性。
对于装置/系统/设备/存储介质/应用程序实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
本说明书中的各个实施例均采用相关的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于装置实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
本领域普通技术人员可以理解实现上述方法实施方式中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,所述的程序可以存储于计算机可读取存储介质中,这里所称得的存储介质,如:ROM/RAM、磁碟、光盘等。
以上所述仅为本申请的较佳实施例而已,并非用于限定本申请的保护范围。凡在本申请的精神和原则之内所作的任何修改、等同替换、改进等,均包含在本申请的保护范围内。

Claims (18)

  1. 一种设备控制方法,其特征在于,应用于声源定位系统中的视频采集设备,所述声源定位系统还包括设置于所述视频采集设备外部的麦克风,所述方法包括:
    获取各麦克风采集的语音信号,并获取所述各麦克风的坐标信息,以及各预设位置点的坐标信息;
    根据所述各麦克风采集的语音信号、所述各麦克风的坐标信息、以及各预设位置点的坐标信息,计算各位置点对应的语音功率值;
    识别语音功率值最大的位置点,将该位置点确定为声源位置,并控制自身对准所述声源位置。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述各麦克风采集的语音信号、所述各麦克风的坐标信息、以及各预设位置点的坐标信息,计算各位置点对应的语音功率值的步骤包括:
    根据各麦克风采集的语音信号,计算各语音信号的傅里叶变换;
    针对每个预设位置点,根据该预设位置点的坐标信息、以及各麦克风的坐标信息,计算该预设位置点到每两个相邻麦克风的时延差;
    根据各语音信号的傅里叶变换,以及所述该预设位置点到每两个相邻麦克风的时延差,计算所述该预设位置点到每两个相邻麦克风的广义互相关;
    根据该预设位置点到每两个相邻麦克风的广义互相关,计算该位置点对应的语音功率值。
  3. 根据权利要求2所述的方法,其特征在于,所述针对每个预设位置点,根据该预设位置点的坐标信息、以及各麦克风的坐标信息,计算该预设位置点到每两个相邻麦克风的时延差的步骤包括:
    根据以下公式,计算任一预设位置点m到任两个相邻麦克风k、l的时延差τmkl
    Figure PCTCN2017106800-appb-100001
    其中,所述Dmk为所述预设位置点m到麦克风k的距离,所述Dml为所述预设位置点m到麦克风l的距离,所述c为声速。
  4. 根据权利要求3所述的方法,其特征在于,所述根据所述各语音信号的傅里叶变换,以及所述该预设位置点到每两个相邻麦克风的时延差,计算所述该预设位置点到每两个相邻麦克风的广义互相关的步骤包括:
    根据以下公式,计算所述任一预设位置点m到所述任两个相邻麦克风k、l的广义互相关R(τmkl):
    Figure PCTCN2017106800-appb-100002
    其中,所述Mk(w)为所述麦克风k接收的语音信号的傅里叶变换;所述
    Figure PCTCN2017106800-appb-100003
    为所述麦克风l接收的语音信号的傅里叶变换的共轭;所述w为语音信号频率;所述φkl(w)通过以下公式确定:
    Figure PCTCN2017106800-appb-100004
  5. 根据权利要求4所述的方法,其特征在于,所述根据该预设位置点到每两个相邻麦克风的广义互相关,计算该位置点对应的语音功率值的步骤包括:
    根据以下公式,计算所述任一预设位置点m对应的语音功率值P(m):
    Figure PCTCN2017106800-appb-100005
    其中,所述M为麦克风总数。
  6. 根据权利要求1-5任一项所述的方法,其特征在于,所述方法还包括:
    确定所述声源位置处的人体的身高;
    根据自身的坐标信息、自身的高度、所述声源位置的坐标信息、以及所 述声源位置处的人体的身高,计算从自身到所述人体的目标距离;
    根据预先保存的各距离与焦距的对应关系,识别与所述目标距离对应的目标焦距,并调节自身的焦距为所述目标焦距。
  7. 根据权利要求6所述的方法,其特征在于,所述确定所述声源位置处的人体的身高的步骤包括:
    获取预先保存的人体平均身高,并将所获取的人体平均身高确定为所述声源位置处的人体的身高;或
    采集包含所述声源位置处的人体的图像,对所述图像进行分析,得到所述声源位置处的人体的身高。
  8. 一种设备控制装置,其特征在于,应用于声源定位系统中的视频采集设备,所述声源定位系统还包括设置于所述视频采集设备外部的麦克风,所述装置包括:
    获取模块,用于获取各麦克风采集的语音信号,并获取所述各麦克风的坐标信息,以及各预设位置点的坐标信息;
    第一计算模块,用于根据所述各麦克风采集的语音信号、所述各麦克风的坐标信息、以及各预设位置点的坐标信息,计算各位置点对应的语音功率值;
    控制模块,用于识别语音功率值最大的位置点,将该位置点确定为声源位置,并控制自身对准所述声源位置。
  9. 根据权利要求8所述的装置,其特征在于,所述计算模块,包括:
    第一计算子模块,用于根据各麦克风采集的语音信号,计算各语音信号的傅里叶变换;
    第二计算子模块,用于针对每个预设位置点,根据该预设位置点的坐标信息、以及各麦克风的坐标信息,计算该预设位置点到每两个相邻麦克风的时延差;
    第三计算子模块,用于根据各语音信号的傅里叶变换,以及所述该预设 位置点到每两个相邻麦克风的时延差,计算所述该预设位置点到每两个相邻麦克风的广义互相关;
    第四计算子模块,用于根据该预设位置点到每两个相邻麦克风的广义互相关,计算该位置点对应的语音功率值。
  10. 根据权利要求9所述的装置,其特征在于,所述第二计算子模块,具体用于根据以下公式,计算任一预设位置点m到任两个相邻麦克风k、l的时延差τmkl
    Figure PCTCN2017106800-appb-100006
    其中,所述Dmk为所述预设位置点m到麦克风k的距离,所述Dml为所述预设位置点m到麦克风l的距离,所述c为声速。
  11. 根据权利要求10所述的装置,其特征在于,所述第三计算子模块,具体用于根据以下公式,计算所述任一预设位置点m到所述任两个相邻麦克风k、l的广义互相关R(τmkl):
    Figure PCTCN2017106800-appb-100007
    其中,所述Mk(w)为所述麦克风k接收的语音信号的傅里叶变换;所述
    Figure PCTCN2017106800-appb-100008
    为所述麦克风l接收的语音信号的傅里叶变换的共轭;所述w为语音信号频率;所述φkl(w)通过以下公式确定:
    Figure PCTCN2017106800-appb-100009
  12. 根据权利要求11所述的装置,其特征在于,所述第四计算子模块,具体用于根据以下公式,计算所述任一预设位置点m对应的语音功率值P(m):
    Figure PCTCN2017106800-appb-100010
    其中,所述M为麦克风总数。
  13. 根据权利要求8-12任一项所述的装置,其特征在于,所述装置还包括:
    确定模块,用于确定所述声源位置处的人体的身高;
    第二计算模块,用于根据所述视频采集设备的坐标信息、所述视频采集设备的高度、所述声源位置的坐标信息、以及所述声源位置处的人体的身高,计算从自身到所述人体的目标距离;
    调节模块,用于根据预先保存的各距离与焦距的对应关系,识别与所述目标距离对应的目标焦距,并调节自身的焦距为所述目标焦距。
  14. 根据权利要求13所述的装置,其特征在于,所述确定模块,具体用于:
    获取预先保存的人体平均身高,并将所获取的人体平均身高确定为所述声源位置处的人体的身高;或
    采集包含所述声源位置处的人体的图像,对所述图像进行分析,得到所述声源位置处的人体的身高。
  15. 一种设备控制系统,其特征在于,所述系统包括:视频采集设备、以及设置于所述视频采集设备外部的麦克风;
    所述视频采集设备,用于获取各麦克风采集的语音信号,并获取所述各麦克风的坐标信息,以及各预设位置点的坐标信息;根据所述各麦克风采集的语音信号、所述各麦克风的坐标信息、以及各预设位置点的坐标信息,计算各位置点对应的语音功率值;识别语音功率值最大的位置点,将该位置点确定为声源位置,并控制自身对准所述声源位置;
    所述各麦克风,用于采集语音信号,并将其采集的语音信号发送给所述视频采集设备。
  16. 一种视频采集设备,其特征在于,包括:
    处理器、存储器、通信接口和总线;
    所述处理器、所述存储器和所述通信接口通过所述总线连接并完成相互间的通信;
    所述存储器存储可执行程序代码;
    所述处理器通过读取所述存储器中存储的可执行程序代码来运行与所述可执行程序代码对应的程序,以用于执行如权利要求1-7任一项所述的一种设备控制方法。
  17. 一种存储介质,其特征在于,所述存储介质用于存储可执行程序代码,所述可执行程序代码用于在运行时执行如权利要求1-7任一项所述的一种设备控制方法。
  18. 一种应用程序,其特征在于,所述应用程序用于在运行时执行如权利要求1-7任一项所述的一种设备控制方法。
PCT/CN2017/106800 2016-11-23 2017-10-19 一种设备控制方法、装置及系统 WO2018095166A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US16/462,453 US10816633B2 (en) 2016-11-23 2017-10-19 Device control method, apparatus and system
EP17874320.9A EP3546976B1 (en) 2016-11-23 2017-10-19 Device control method, apparatus and system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201611047345.8A CN108089152B (zh) 2016-11-23 2016-11-23 一种设备控制方法、装置及系统
CN201611047345.8 2016-11-23

Publications (1)

Publication Number Publication Date
WO2018095166A1 true WO2018095166A1 (zh) 2018-05-31

Family

ID=62171146

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/106800 WO2018095166A1 (zh) 2016-11-23 2017-10-19 一种设备控制方法、装置及系统

Country Status (4)

Country Link
US (1) US10816633B2 (zh)
EP (1) EP3546976B1 (zh)
CN (1) CN108089152B (zh)
WO (1) WO2018095166A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110221797A (zh) * 2019-05-28 2019-09-10 上海寰视网络科技有限公司 多屏拼接系统中信号源设备的控制方法及系统

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110874909A (zh) * 2018-08-29 2020-03-10 杭州海康威视数字技术股份有限公司 监控方法、系统及可读存储介质
CN110009770A (zh) * 2018-12-10 2019-07-12 阿里巴巴集团控股有限公司 一种标识用户的方法、设备及门禁装置
CN109506568B (zh) * 2018-12-29 2021-06-18 思必驰科技股份有限公司 一种基于图像识别和语音识别的声源定位方法及装置
CN110534105B (zh) * 2019-07-24 2021-10-15 珠海格力电器股份有限公司 一种语音控制方法及装置
CN110398727B (zh) * 2019-07-31 2023-08-01 深圳市康冠商用科技有限公司 一种设备控制方法和设备控制系统
CN111526295B (zh) * 2020-04-30 2023-02-28 臻迪科技股份有限公司 音视频处理系统、采集方法、装置、设备及存储介质
CN113625223B (zh) * 2020-05-08 2024-04-30 中信科智联科技有限公司 一种定位方法及终端设备
CN112562664A (zh) * 2020-11-27 2021-03-26 上海仙塔智能科技有限公司 音响调节方法、系统、车辆及计算机存储介质
CN112367473A (zh) * 2021-01-13 2021-02-12 北京电信易通信息技术股份有限公司 一种基于声纹到达相位的可旋转摄像装置及其控制方法
CN113466793B (zh) * 2021-06-11 2023-10-17 五邑大学 一种基于麦克风阵列的声源定位方法、装置及存储介质
CN115134499B (zh) * 2022-06-28 2024-02-02 世邦通信股份有限公司 一种音视频监控方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6707489B1 (en) * 1995-07-31 2004-03-16 Forgent Networks, Inc. Automatic voice tracking camera system and method of operation
CN101534413A (zh) * 2009-04-14 2009-09-16 深圳华为通信技术有限公司 一种远程呈现的系统、装置和方法
CN103235287A (zh) * 2013-04-17 2013-08-07 华北电力大学(保定) 一种声源定位摄像追踪装置
CN104142492A (zh) * 2014-07-29 2014-11-12 佛山科学技术学院 一种srp-phat多源空间定位方法
CN104898091A (zh) * 2015-05-29 2015-09-09 复旦大学 基于迭代优化算法的麦克风阵列自校准声源定位系统

Family Cites Families (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7012630B2 (en) * 1996-02-08 2006-03-14 Verizon Services Corp. Spatial sound conference system and apparatus
JP2003322790A (ja) 2002-05-02 2003-11-14 Nippon Hoso Kyokai <Nhk> 自動焦点カメラ
US7190775B2 (en) * 2003-10-29 2007-03-13 Broadcom Corporation High quality audio conferencing with adaptive beamforming
US7667728B2 (en) * 2004-10-15 2010-02-23 Lifesize Communications, Inc. Video and audio conferencing system with spatial audio
US7421113B2 (en) * 2005-03-30 2008-09-02 The Trustees Of The University Of Pennsylvania System and method for localizing imaging devices
US20080267423A1 (en) * 2007-04-26 2008-10-30 Kabushiki Kaisha Kobe Seiko Sho Object sound extraction apparatus and object sound extraction method
US8391472B2 (en) * 2007-06-06 2013-03-05 Dreamworks Animation Llc Acoustic echo cancellation solution for video conferencing
WO2010109918A1 (ja) * 2009-03-26 2010-09-30 パナソニック株式会社 復号化装置、符号化復号化装置および復号化方法
US8855205B2 (en) * 2010-05-26 2014-10-07 Newratek Inc. Method of predicting motion vectors in video codec in which multiple references are allowed, and motion vector encoding/decoding apparatus using the same
US20120250984A1 (en) * 2010-12-01 2012-10-04 The Trustees Of The University Of Pennsylvania Image segmentation for distributed target tracking and scene analysis
US8761412B2 (en) * 2010-12-16 2014-06-24 Sony Computer Entertainment Inc. Microphone array steering with image-based source location
US8693713B2 (en) * 2010-12-17 2014-04-08 Microsoft Corporation Virtual audio environment for multidimensional conferencing
JP5701142B2 (ja) * 2011-05-09 2015-04-15 株式会社オーディオテクニカ マイクロホン
US9226088B2 (en) * 2011-06-11 2015-12-29 Clearone Communications, Inc. Methods and apparatuses for multiple configurations of beamforming microphone arrays
CN102395030B (zh) * 2011-11-18 2014-05-07 杭州海康威视数字技术股份有限公司 基于视频压缩码流的运动分析方法、码流转换方法及其装置
US9246543B2 (en) * 2011-12-12 2016-01-26 Futurewei Technologies, Inc. Smart audio and video capture systems for data processing systems
US9197974B1 (en) * 2012-01-06 2015-11-24 Audience, Inc. Directional audio capture adaptation based on alternative sensory input
CN103780870B (zh) * 2012-10-17 2017-11-21 杭州海康威视数字技术股份有限公司 视频图像质量诊断系统及其方法
CN103841357A (zh) * 2012-11-21 2014-06-04 中兴通讯股份有限公司 基于视频跟踪的麦克风阵列声源定位方法、装置及系统
TWI593294B (zh) * 2013-02-07 2017-07-21 晨星半導體股份有限公司 收音系統與相關方法
CN105075288B (zh) 2013-02-15 2018-10-19 松下知识产权经营株式会社 指向性控制系统、校准方法、水平偏差角计算方法及指向性控制方法
US9312826B2 (en) * 2013-03-13 2016-04-12 Kopin Corporation Apparatuses and methods for acoustic channel auto-balancing during multi-channel signal extraction
US9338544B2 (en) * 2014-06-03 2016-05-10 Cisco Technology, Inc. Determination, display, and adjustment of best sound source placement region relative to microphone
CN105635635A (zh) * 2014-11-19 2016-06-01 杜比实验室特许公司 调节视频会议系统中的空间一致性
CN105898185A (zh) * 2014-11-19 2016-08-24 杜比实验室特许公司 调节视频会议系统中的空间一致性
US9654868B2 (en) * 2014-12-05 2017-05-16 Stages Llc Multi-channel multi-domain source identification and tracking
CN106210612A (zh) * 2015-04-30 2016-12-07 杭州海康威视数字技术股份有限公司 视频编码方法、解码方法及其装置
US9584758B1 (en) * 2015-11-25 2017-02-28 International Business Machines Corporation Combining installed audio-visual sensors with ad-hoc mobile audio-visual sensors for smart meeting rooms
US9621795B1 (en) * 2016-01-08 2017-04-11 Microsoft Technology Licensing, Llc Active speaker location detection
US9992580B2 (en) * 2016-03-04 2018-06-05 Avaya Inc. Signal to noise ratio using decentralized dynamic laser microphones
CN205490942U (zh) * 2016-03-16 2016-08-17 上海景瑞信息技术有限公司 一种基于语音识别的摄像机自动定位系统
CN107437420A (zh) * 2016-05-27 2017-12-05 富泰华工业(深圳)有限公司 语音信息的接收方法、系统及装置
US9659576B1 (en) * 2016-06-13 2017-05-23 Biamp Systems Corporation Beam forming and acoustic echo cancellation with mutual adaptation control
CN107026934B (zh) * 2016-10-27 2019-09-27 华为技术有限公司 一种声源定位方法和装置
US9966059B1 (en) * 2017-09-06 2018-05-08 Amazon Technologies, Inc. Reconfigurale fixed beam former using given microphone array
US10110994B1 (en) * 2017-11-21 2018-10-23 Nokia Technologies Oy Method and apparatus for providing voice communication with spatial audio

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6707489B1 (en) * 1995-07-31 2004-03-16 Forgent Networks, Inc. Automatic voice tracking camera system and method of operation
CN101534413A (zh) * 2009-04-14 2009-09-16 深圳华为通信技术有限公司 一种远程呈现的系统、装置和方法
CN103235287A (zh) * 2013-04-17 2013-08-07 华北电力大学(保定) 一种声源定位摄像追踪装置
CN104142492A (zh) * 2014-07-29 2014-11-12 佛山科学技术学院 一种srp-phat多源空间定位方法
CN104898091A (zh) * 2015-05-29 2015-09-09 复旦大学 基于迭代优化算法的麦克风阵列自校准声源定位系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3546976A4

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110221797A (zh) * 2019-05-28 2019-09-10 上海寰视网络科技有限公司 多屏拼接系统中信号源设备的控制方法及系统
CN110221797B (zh) * 2019-05-28 2022-09-30 上海寰视网络科技有限公司 多屏拼接系统中信号源设备的控制方法及系统

Also Published As

Publication number Publication date
EP3546976A4 (en) 2019-10-09
CN108089152A (zh) 2018-05-29
CN108089152B (zh) 2020-07-03
US20190317178A1 (en) 2019-10-17
EP3546976A1 (en) 2019-10-02
EP3546976B1 (en) 2021-12-08
US10816633B2 (en) 2020-10-27

Similar Documents

Publication Publication Date Title
WO2018095166A1 (zh) 一种设备控制方法、装置及系统
CN112074901B (zh) 语音识别登入
CN104246878B (zh) 音频用户交互辨识和上下文精炼
CN111432115B (zh) 基于声音辅助定位的人脸追踪方法、终端及存储装置
US9008320B2 (en) Apparatus, system, and method of image processing, and recording medium storing image processing control program
US10931919B2 (en) Video conference system, video conference apparatus, and video conference method
JP6467736B2 (ja) 音源位置推定装置、音源位置推定方法および音源位置推定プログラム
US9591229B2 (en) Image tracking control method, control device, and control equipment
US11595615B2 (en) Conference device, method of controlling conference device, and computer storage medium
KR101508092B1 (ko) 화상 회의를 지원하는 방법 및 시스템
WO2021120190A1 (zh) 数据处理方法、装置、电子设备和存储介质
CN111551921A (zh) 一种声像联动的声源定向系统及方法
Plinge et al. Geometry calibration of distributed microphone arrays exploiting audio-visual correspondences
KR101976937B1 (ko) 마이크로폰 어레이를 이용한 회의록 자동작성장치
CN113611308A (zh) 一种语音识别方法、装置、系统、服务器及存储介质
Nguyen et al. Selection of the closest sound source for robot auditory attention in multi-source scenarios
CN111492668B (zh) 用于在限定的空间内定位音频信号的发源点的方法和系统
Maganti et al. Speaker localization for microphone array-based asr: the effects of accuracy on overlapping speech
JP2007074317A (ja) 情報処理装置および情報処理方法
CN110730378A (zh) 一种信息处理方法及系统
US11601740B2 (en) Automated microphone system and method of adjustment thereof
WO2023276701A1 (ja) 話者ダイアライゼーションシステム、コミュニケーション解析システム、及び、発話量推定方法
Nguyen et al. Spatialized audio multiparty teleconferencing with commodity miniature microphone array
WO2023276700A1 (ja) 注力判定システム、コミュニケーション解析システム、及び、注力判定方法
US20230105785A1 (en) Video content providing method and video content providing device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17874320

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2017874320

Country of ref document: EP

Effective date: 20190624