WO2021037129A1 - 一种声音采集方法及装置 - Google Patents

一种声音采集方法及装置 Download PDF

Info

Publication number
WO2021037129A1
WO2021037129A1 PCT/CN2020/111684 CN2020111684W WO2021037129A1 WO 2021037129 A1 WO2021037129 A1 WO 2021037129A1 CN 2020111684 W CN2020111684 W CN 2020111684W WO 2021037129 A1 WO2021037129 A1 WO 2021037129A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
collection
location information
microphone array
target sound
Prior art date
Application number
PCT/CN2020/111684
Other languages
English (en)
French (fr)
Inventor
罗大为
Original Assignee
北京搜狗科技发展有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京搜狗科技发展有限公司 filed Critical 北京搜狗科技发展有限公司
Publication of WO2021037129A1 publication Critical patent/WO2021037129A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/20Arrangements for obtaining desired frequency or directional characteristics
    • H04R1/32Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
    • H04R1/326Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only for microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2430/00Signal processing covered by H04R, not provided for in its groups

Definitions

  • This application relates to the technical field of data processing, and in particular to a sound collection method and device.
  • the microphone array is generally composed of a certain number of acoustic sensors, which are used to sample and process the spatial characteristics of the sound field. Microphone arrays are of great significance in the field of human-computer interaction, which can greatly extend the interaction distance, so that users can perform natural voice interaction without holding or close to the radio equipment. It has been widely used in scenarios such as smart homes.
  • the entire space needs to be scanned to collect sound signals.
  • the use environment of the microphone array is complicated, and the sound emitted by the target sound source may not be accurately collected, which causes the microphone array to fail to achieve the expected use effect.
  • the embodiments of the present application provide a sound collection method and device to solve the technical problem that the microphone array in the prior art may not be able to accurately collect the sound of the target sound source.
  • a sound collection method is provided, the method is applied to a microphone array, and the method includes:
  • the method further includes:
  • the acquiring location information of the interference source includes:
  • the user corresponding to the collection direction other than the target sound source direction is determined as an interfering user, and the interfering user is acquired
  • the location information is used as the location information of the interference source.
  • the method further includes:
  • the method further includes:
  • the directional suppression collection of the direction of the interference source includes:
  • the direction of the interference source is subjected to directional suppression collection according to the interference reverberation information.
  • the method further includes:
  • the determining the collection direction corresponding to the user according to the location information of the user includes:
  • the first line is the visual sensor determined according to the position information of the visual sensor system and the position information of the microphone array
  • the second connection is a connection between the microphone array and the user determined according to the position information of the microphone array and the position information of the user
  • the method further includes:
  • the control When the no user activity signal detected by the visual sensor system is acquired, the control enters the standby state.
  • a sound collection device is provided, the device is applied to a microphone array, and the device includes:
  • the first acquiring unit is used to acquire the user's location information collected by the vision sensor system in real time;
  • the first determining unit is configured to determine the collection direction corresponding to the user according to the location information of the user;
  • the second determining unit is configured to determine the collection direction of the received target sound signal as the target sound source direction when the target sound signal is received;
  • the first collection unit is used to collect sound in the direction of the target sound source to obtain collected sound signals.
  • the device further includes:
  • the second acquiring unit is used to acquire the location information of the interference source
  • a third determining unit configured to determine the direction of the interference source according to the location information of the interference source
  • the second collection unit is configured to perform directional suppression collection on the direction of the interference source during the process of collecting the sound on the direction of the target sound source.
  • the second acquiring unit is specifically configured to acquire the location information of the fixed interference source marked in advance as the location information of the interference source; and/or, the collection of the target sound signal will be received After the direction is determined as the direction of the target sound source, users corresponding to other collection directions except the direction of the target sound source are determined as interfering users, and the position information of the interfering users is acquired as the position information of the interference source.
  • the device further includes:
  • the first calculation unit is configured to calculate the room impulse response according to the location information of the target user, the size information of the space, and the location information of the microphone array, and the target user is the user corresponding to the target sound source direction;
  • the elimination unit is configured to use the room impulse response as an initial parameter of the de-reverberation algorithm, and perform a de-reverberation operation on the collected sound signal according to the de-reverberation algorithm.
  • the device further includes:
  • the second calculation unit is configured to calculate interference reverberation information according to the location information of the interference source, the size information of the space, and the location information of the microphone array;
  • the second collection unit is specifically configured to perform directional suppression collection on the direction of the interference source according to the interference reverberation information.
  • the device further includes:
  • a receiving unit configured to receive a designated frequency sound signal sent by the visual sensing system
  • the third calculation unit is configured to calculate the first angular difference between the zero-degree orientation of the microphone array and the direction in which the designated frequency sound signal is received.
  • the first determining unit includes:
  • the calculation subunit is used to calculate the second angle difference between the first connection and the second connection; the first connection is determined according to the position information of the visual sensor system and the position information of the microphone array The connection between the visual sensing system and the microphone array, and the second connection is the microphone array and the user determined according to the position information of the microphone array and the position information of the user The connection between
  • the determining sub-unit is configured to determine a third angle difference between the zero-degree orientation of the microphone array and the second connection line according to the first angle difference and the second angle difference, and calculate the third angle difference As the collection direction corresponding to the user.
  • the device further includes:
  • the control unit is used for controlling to enter the standby state when the no user activity signal detected by the visual sensing system is acquired.
  • a device for sound collection which includes a memory and one or more programs, wherein one or more programs are stored in the memory and configured to be composed of one or more programs.
  • the execution of the one or more programs by one or more processors includes instructions for performing the following operations:
  • a computer-readable medium having instructions stored thereon, which when executed by one or more processors, cause the device to execute the sound collection method described in the first aspect.
  • the microphone array first obtains the user's location information collected in real time from the visual sensing system, so as to determine the user's corresponding collection direction according to the user's location information. That is, the possible sound source direction is first determined according to the user's position information collected by the visual sensor system. Then carry out directional sound collection in the collection direction corresponding to the user. If the target sound signal is received in the collection direction corresponding to the user, the collection direction of the received target sound signal is determined as the target sound source direction, and then the sound collection is performed on the target sound source direction , So as to obtain the required sound signal.
  • the embodiment of the present application can determine multiple possible collection directions and determine the final target sound source direction with the assistance of the visual sensing system, so as to perform sound collection according to the known sound source direction. It avoids scanning and collecting in all directions in space, and improves the accuracy and efficiency of collecting.
  • the visual sensor system can collect the user's location information in real time, so that the microphone array can obtain the user's real-time location information, and then can determine the user's corresponding collection direction in real time, avoiding the problem of inaccurate directional radio reception due to user movement.
  • FIG. 1 is a schematic diagram of an application scenario provided by an embodiment of the application
  • FIG. 2 is a flowchart of a sound collection method provided by an embodiment of the application
  • FIG. 3 is a flowchart of a method for suppressing an interference source provided by an embodiment of the application
  • FIG. 4 is an example diagram of determining a user collection direction provided by an embodiment of this application.
  • FIG. 5 is a structural diagram of a sound collection device provided by an embodiment of the application.
  • FIG. 6 is a structural diagram of another sound collection device provided by an embodiment of the application.
  • FIG. 7 is a structural diagram of a server provided by an embodiment of the application.
  • the inventor found that the traditional sound collection method mainly uses the microphone array to perform full blind scanning in the entire space, and then estimates the target sound source according to the sound source localization method.
  • the traditional sound collection method mainly uses the microphone array to perform full blind scanning in the entire space, and then estimates the target sound source according to the sound source localization method.
  • it is difficult to accurately estimate the target sound source, and thus the sound signal of the target sound source cannot be accurately obtained.
  • the embodiment of the present application provides a sound collection method. Specifically, before the microphone array collects sound signals, it first obtains real-time collected user location information from the visual sensor system, and then determines the user's corresponding location information according to the user's location. Collection direction. That is, before the microphone array collects the sound signal, it first determines the collection direction of the possible sound source according to the user's location information. Then, perform directional radio in the possible collection direction. If the target sound signal is collected in the possible collection direction, the collection direction of the collected target sound signal is determined as the target sound source direction, and the user corresponding to the collection direction is the target user . Finally, perform sound collection in the direction of the target sound source to obtain the sound signal of the target user.
  • the microphone array can first pick up the sound in the collection direction where the target sound source may exist, and then determine the target sound source direction according to the radio reception result, so that the sound can be collected in the determined target sound source direction
  • the signal does not need to be scanned in all directions, which improves the accuracy of the sound signal collection of the target sound source.
  • FIG. 1 is a schematic diagram of the framework of an exemplary application scenario provided by the embodiments of the present application.
  • the sound collection method provided in the embodiment of the present application can be applied to the microphone array 10.
  • the visual sensor system 20 can be installed in a space, such as a room, and the specific installation location can be determined according to the actual situation to ensure that it can monitor the entire space.
  • the visual sensor system 20 can collect the position information of each user (for example, user 1 and user 2) in the space in real time.
  • the microphone array 10 obtains the position information of each user in the space from the visual sensing system 20 to determine the respective collection direction of each user. Then, the microphone array 10 performs directional sound collection in each collection direction to obtain the sound signal of each user. If the target sound signal appears in the directional radio, the collection direction of the received target sound signal is determined as the target sound source direction to collect sound from the target sound source direction to obtain the sound signal of the target user. For example, the microphone array 10 receives the sound signal of user 1 and the sound signal of user 2 respectively.
  • the collection direction corresponding to user 1 is the target sound source direction, and user 1 is the target user , And then the microphone array collects the sound of the user 1 in the collecting direction to obtain the sound signal of the target user.
  • the vision sensor system in this embodiment may include an infrared camera device, a color camera device, a high-frequency sounding unit, and a transmission unit.
  • the role of the visual sensing system is to locate and track the location of indoor sound-producing equipment and people, and transmit it to the microphone array.
  • the external camera equipment and/or the color camera equipment can be used to collect the user's location information in real time
  • the high-frequency sound unit can be used to specify the frequency sound signal
  • the transmission unit can be used to send the collected user's location information to the microphone.
  • Array The microphone array can include multiple microphones and acquisition boards, speakers, and signal processing units. The function of the microphone array is to process the array signal according to the position information transmitted by the visual aid device, perform far-field sound pickup, and realize far-field voice interaction with the user through its own speakers.
  • the microphone array can directly communicate with the visual sensor system through wireless means such as Bluetooth, or can communicate with the visual sensor system through a router or network transmission protocol, which is not limited in this embodiment.
  • FIG. 1 the schematic diagram of the framework shown in FIG. 1 is only an example in which the embodiments of the present application can be implemented. The scope of application of the implementation of this application is not limited by any aspect of the framework.
  • FIG. 2 is a flowchart of a sound collection method provided by an embodiment of the application.
  • the method is applied to a microphone array. As shown in FIG. 2, the method may include:
  • S201 Acquire location information of the user collected in real time by the vision sensor system.
  • the visual sensor system can collect the position information of each user in the space in real time.
  • the microphone array can obtain the position information of each user from the visual sensing system, so that the possible sound source position can be known.
  • the location information of the user may be location information in a space coordinate system, and the location information is the location coordinates of the user in space.
  • the visual sensor system will collect the user’s position information in real time, so that the microphone array can obtain the latest position information. To ensure that the microphone array can determine the latest collection direction corresponding to the user when S202 is executed.
  • S202 Determine the collection direction corresponding to the user according to the location information of the user.
  • the microphone array After the microphone array obtains the location information of each user in the space, it can determine the user's corresponding collection direction according to its own location information and the user's location information. In specific implementation, since the position coordinates of the microphone array in the space are known, after obtaining the user's position coordinates, through two position coordinates, the user's direction relative to the microphone array can be calculated, that is, the user's corresponding collection direction.
  • the visual sensor system first obtains the position information of the user existing in the current space, so that the microphone array can obtain the position information of the user who may be a sound source in the space in advance, and the microphone array can determine the possible sound source through S202. Corresponding acquisition direction, there is no need to scan all directions in the space to estimate the sound source position.
  • S203 Perform directional radio reception on the collection direction corresponding to the user.
  • the microphone array when the microphone array determines the collection direction corresponding to each user, directional radio is performed on the collection direction corresponding to each user to obtain the sound signal of each user.
  • the microphone array can not only directional radio the user's corresponding collection direction, but also suppress sound interference from other directions, so as to improve the accuracy of subsequent sound source direction determination.
  • a beamforming method can be used for directional radio, which is specifically to obtain the spatial spectrum characteristics of the sound signal through a microphone array, and then perform spatial filtering on the sound signal to achieve directional radio.
  • the microphone array when the microphone array obtains the sound signal in each collection direction, if there is a target sound signal in the received sound signal, the collection direction of the received target sound signal is determined as the target sound source direction.
  • the target sound signal may be that a specific wake-up word exists in the sound signal and/or the voiceprint feature of the voice signal meets the preset voiceprint feature.
  • the set wake-up words can be pre-stored in the microphone array, and when directional radio is performed from the collection direction corresponding to the user, it is determined whether the preset wake-up words appear in the received sound signal. If it exists, the sound signal is determined as the target sound signal, and the collection direction corresponding to the target sound signal is determined as the target sound source direction, and the user corresponding to the target sound signal is the target user.
  • the voiceprint feature of the target user is pre-stored in the microphone array, and when directional collection is performed from the collection direction corresponding to the user, it is determined whether the voiceprint feature of the received voice signal is the same as the pre-defined voiceprint feature. If they are the same, the sound signal is determined as the target sound signal, and the collection direction corresponding to the target sound signal is determined as the target sound source direction, and the user corresponding to the target sound signal is the target user.
  • S205 Perform sound collection on the direction of the target sound source to obtain the collected sound signal.
  • the microphone array can collect the sound signal in the direction of the target sound source, thereby obtaining the sound signal of the target sound source, and then performing operations such as voice recognition.
  • this implementation also provides a de-reverberation method, which may specifically include:
  • the position information of the target user can be obtained through the visual sensing system, and then the room impulse response is calculated according to the position information of the target user, the size information of the space, and the position information of the microphone array.
  • the target user is a user corresponding to the direction of the target sound source.
  • the IMAGE method can be used to estimate the room impulse response.
  • the room impulse response When the room impulse response is obtained, it is used as the initial parameter of the de-reverberation algorithm to improve the performance of the de-reverberation algorithm.
  • the de-reverberation algorithm is then used to de-reverberate the collected sound signal of the target user to obtain a de-reverberated sound signal, thereby avoiding the effect of reverberation on the user's hearing. That is, in view of the problem of the degradation of the recognition effect caused by reverberation, in this embodiment, on the basis of obtaining the position information of the target sound source, combined with the spatial size and the position of the microphone array, relatively accurate initial parameters of the dereverberation filter can be obtained, thereby obtaining Better de-reverberation effect.
  • the microphone array in the embodiment of the present application first obtains the user's location information collected in real time from the visual sensing system, so as to determine the user's corresponding collection direction according to the user's location information. That is, the direction of the possible sound source is first determined according to the user's location information collected by the visual sensor system. Then carry out directional sound collection in the collection direction corresponding to the user. If the target sound signal is received in the collection direction corresponding to the user, the collection direction of the received target sound signal is determined as the target sound source direction, and then the sound collection is performed on the target sound source direction , So as to obtain the required sound signal.
  • the embodiment of the present application can determine multiple possible collection directions and determine the final target sound source direction with the assistance of the visual sensing system, so as to perform sound collection according to the known sound source direction. It avoids scanning and collecting in all directions in space, and improves the accuracy and efficiency of collecting.
  • the visual sensor system can collect the user's location information in real time, so that the microphone array can obtain the user's real-time location information, and then can determine the user's corresponding collection direction in real time, avoiding the problem of inaccurate directional radio reception due to user movement.
  • the microphone array can suppress the sound signal in the direction of the interference source when collecting the sound signal in the direction of the target sound source.
  • Fig. 3 is a flowchart of a method for suppressing an interference source provided by an embodiment of the application, and the method may include:
  • S301 Acquire location information of the interference source.
  • S302 Determine the direction of the interference source according to the location information of the interference source.
  • the microphone array first obtains the position information of each interference source in the space, so as to determine the direction of the interference source according to the position information of the interference source, that is, determine the direction of the interference source relative to the microphone array.
  • the interference source can be a fixed sound-producing device in the space, such as a television, a stereo, an air conditioner, etc., or it can be other users in the space except the target user.
  • the interference source is a fixed sounding device
  • the microphone obtains the location information of the interference source
  • the location information of the fixed interference source marked in advance may be obtained as the interference source location information. That is, when the interference source is a fixed sounding device, since its position in space is usually fixed, the position information of the fixed interference source in space can be marked in advance, so that the microphone array can directly obtain the position information of the fixed interference source .
  • the microphone array When the interference source is a user other than the target user in the space, when the microphone array obtains the location information of the interference source, it can determine the collection direction of the received target sound signal as the target sound source direction, and then exclude the target sound source direction Users corresponding to other collection directions are determined as interfering users, and the location information of the interfering users is used as the location information of the interference source. That is, after the microphone array acquires the collection direction corresponding to each user in the space, when S203 is executed, the user corresponding to the collection direction that receives the target sound signal is determined as the target user, and users corresponding to other collection directions are determined to be the interfering user.
  • the location information of is the location information of the interference source.
  • the microphone array collects the sound signal in the direction of the target sound source while performing directional suppression collection on the direction of the interference source to reduce the collection of the interference sound signal.
  • the microphone array can adopt a low-complexity and strong suppression fixed null-notch beamforming method to form a beam in the direction of the target sound source to collect sound signals, and suppress it through the null position in the direction of the interference source.
  • this embodiment provides an implementation manner for calculating the interference source reverberation information.
  • the interference source reverberation information is calculated according to the location information of the interference source, the size information of the space, and the location information of the microphone array; then the direction of the interference source is collected and suppressed, including: the direction of the interference source according to the interference reverberation information Perform directional acquisition suppression. That is, the microphone array can calculate the interference reverberation information generated by the interference source in the space according to the location information of the interference source, the size information of the space, and its own location information. When performing directional collection suppression on the direction of the interference source, directional collection suppression is performed according to the interference reverberation information.
  • the direction of the interference source can be collected and suppressed according to the generalized sidelobe cancellation (Generalized Sidelobe Canceller, GSC) method and the interference reverberation information.
  • GSC Generalized Sidelobe Canceller
  • the interference reverberation information is used as the reference initial value of the adaptive filter in the method, and the interference suppression capability of the microphone array is enhanced by accelerating the convergence speed.
  • the microphone array can obtain the position information of the interference source to accurately determine the direction of all the interference sources, and then suppress the interference in the direction of the interference source when collecting the sound signal in the direction of the target sound source, thereby achieving stable and efficient pickup. Tones and suppression effects.
  • this application combines the spatial size information and the position information of the microphone array to obtain more accurate interference reverberation information, and uses it in the interference suppression filter to further suppress the interference. Improve the signal-to-noise ratio of the microphone array output.
  • the microphone array can also calibrate its own array orientation according to the calibration sound emitted by the vision sensor system to obtain the orientation of the vision sensor system relative to the microphone array. Specifically, receiving a sound signal of a designated frequency sent by a visual sensing system; calculating a first angular difference between the zero-degree orientation of the microphone array and the direction of receiving the sound signal of the designated frequency. Wherein, the zero-degree orientation of the microphone array is the zero-degree orientation defined by the microphone array itself. When performing directional sound collection, the collection direction is determined based on the zero-degree orientation.
  • the microphone array can obtain the direction of the visual sensor system that emits the specified frequency sound signal relative to the zero-degree orientation of the microphone array by measuring the direction of the specified frequency sound signal, that is, determine the connection between the visual sensor system and the microphone array.
  • the angle of zero-degree orientation as shown in Figure 4.
  • the microphone array can determine the first angle difference of the visual sensor system relative to the zero-degree orientation according to the direction of arrival (Direction Of Arrival, DOA) estimation algorithm when receiving a sound signal of a specified frequency.
  • DOA Direction Of Arrival
  • the microphone array performs directional radio based on the zero-degree orientation during directional radio, when the microphone array determines the user's corresponding collection direction according to the user's position information, the collection direction should be the user's zero-degree orientation relative to the microphone array. Direction, so that the sound signal of the target sound source can be accurately collected.
  • this embodiment adopts an implementation manner for determining the collection direction corresponding to the user, which is specifically as follows:
  • the microphone array can determine the connection between the vision sensor system and the microphone array, that is, the first connection, according to the position information of the vision sensor system and the position of the microphone array. Then determine the connection between the microphone array and the user according to the position information of the microphone array and the user's position information, that is, the second connection, and calculate the angle between the two connections, that is, the second angle difference.
  • the trigonometric function can be used to calculate the angle difference between the first line and the second line to obtain the second angle difference.
  • the microphone array, the visual sensor system and the user form a triangle, and the length of each side of the triangle can be calculated according to the position information of the three, and then the second angle difference can be obtained by using the trigonometric function.
  • the microphone array determines the included angle of the user with respect to the direction of zero degree according to the first angle difference between the first connection line and the zero degree orientation and the angle difference between the first connection line and the second connection line. That is, the third angular difference between the zero-degree orientation and the second connecting line, and the third angular difference is taken as the collection direction corresponding to the user. The first angle difference and the second angle difference are added to obtain the third angle difference, so that the microphone array can know how many deflection angles of the zero-degree direction to receive the sound.
  • the microphone array in order to reduce the power consumption of the microphone array and increase the service life, can also control itself to be in a standby state according to the information sent by the visual sensing system. Specifically, when a signal of no user activity detected by the visual sensor system is obtained, the control enters the standby state.
  • the visual sensing system can collect the user's position information in the space in real time, it can monitor whether there is human activity in the space. If no human activity is detected, it informs the microphone array that there is no user activity in the current space, so that the microphone array is at In the standby state, no signal processing or response is performed.
  • the microphone array obtains that the visual sensor system detects a user activity signal, the microphone array enters a state of waiting to be awakened, and obtains the user's position information, so as to perform directional radio and subsequent operations in a possible direction.
  • a full-angle camera system can be installed on the microphone array to assist in locating and tracking the target sound source, and collect the sound signal of the target sound source in real time.
  • multiple microphone arrays can be deployed to form a distributed microphone array system to jointly receive the visual sensing system
  • the sent user's location information can further increase the accuracy of determining the target sound source, and achieve far-field sound pickup and interference suppression.
  • the present application provides a sound collection device, which will be described below with reference to the accompanying drawings.
  • FIG. 5 is a structural diagram of a sound collection device provided by an embodiment of the application.
  • the device is applied to a microphone array.
  • the device may include:
  • the first obtaining unit 501 is configured to obtain the user's location information collected by the vision sensor system in real time;
  • the first determining unit 502 is configured to determine the collection direction corresponding to the user according to the location information of the user;
  • the radio unit 503 is used for directional radio radio in the collection direction corresponding to the user;
  • the second determining unit 504 is configured to determine the collection direction of the received target sound signal as the target sound source direction when the target sound signal is received;
  • the first collection unit 505 is configured to collect sound in the direction of the target sound source to obtain collected sound signals.
  • the device further includes:
  • the second acquiring unit is used to acquire the location information of the interference source
  • a third determining unit configured to determine the direction of the interference source according to the location information of the interference source
  • the second collection unit is configured to perform directional suppression collection on the direction of the interference source during the process of collecting the sound on the direction of the target sound source.
  • the second acquiring unit is specifically configured to acquire the location information of the fixed interference source marked in advance as the location information of the interference source; and/or, the collection of the target sound signal will be received After the direction is determined as the direction of the target sound source, users corresponding to other collection directions except the direction of the target sound source are determined as interfering users, and the position information of the interfering users is acquired as the position information of the interference source.
  • the device further includes:
  • the first calculation unit is configured to calculate the room impulse response according to the location information of the target user, the size information of the space, and the location information of the microphone array, the target user being the user corresponding to the target sound source direction;
  • the elimination unit is configured to use the room impulse response as an initial parameter of the de-reverberation algorithm, and perform a de-reverberation operation on the collected sound signal according to the de-reverberation algorithm.
  • the device further includes:
  • the second calculation unit is configured to calculate interference reverberation information according to the location information of the interference source, the size information of the space, and the location information of the microphone array;
  • the second collection unit is specifically configured to perform directional suppression collection on the direction of the interference source according to the interference reverberation information.
  • the device further includes:
  • a receiving unit configured to receive a designated frequency sound signal sent by the visual sensing system
  • the third calculation unit is configured to calculate the first angular difference between the zero-degree orientation of the microphone array and the direction in which the designated frequency sound signal is received.
  • the first determining unit includes:
  • the calculation subunit is used to calculate the second angle difference between the first connection and the second connection; the first connection is determined according to the position information of the visual sensor system and the position information of the microphone array The connection between the visual sensing system and the microphone array, and the second connection is the microphone array and the user determined according to the position information of the microphone array and the position information of the user The connection between
  • the determining sub-unit is configured to determine a third angle difference between the zero-degree orientation of the microphone array and the second connection line according to the first angle difference and the second angle difference, and calculate the third angle difference As the collection direction corresponding to the user.
  • the device further includes:
  • the control unit is used for controlling to enter the standby state when the no user activity signal detected by the visual sensing system is acquired.
  • the microphone array first obtains the user's location information collected in real time from the visual sensing system, so as to determine the user's corresponding collection direction according to the user's location information. That is, the possible sound source direction is first determined according to the user's position information collected by the visual sensor system. Then carry out directional sound collection in the collection direction corresponding to the user. If the target sound signal is received in the collection direction corresponding to the user, the collection direction of the received target sound signal is determined as the target sound source direction, and then the sound collection is performed on the target sound source direction , So as to obtain the required sound signal.
  • the embodiment of the present application can determine multiple possible collection directions and determine the final target sound source direction with the assistance of the visual sensing system, so as to perform sound collection according to the known sound source direction. It avoids scanning and collecting in all directions in space, and improves the accuracy and efficiency of collecting.
  • the visual sensor system can collect the user's location information in real time, so that the microphone array can obtain the user's real-time location information, and then can determine the user's corresponding collection direction in real time, avoiding the problem of inaccurate directional radio reception due to user movement.
  • Fig. 6 shows a block diagram of a device 600 for realizing sound collection.
  • the apparatus 600 may be a mobile phone, a computer, a digital broadcasting terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, and the like.
  • the device 600 may include one or more of the following components: a processing component 602, a memory 604, a power supply component 606, a multimedia component 608, an audio component 610, an input/output (I/O) interface 612, a sensor component 614, And communication component 616.
  • the processing component 602 generally controls the overall operations of the device 600, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations.
  • the processing element 602 may include one or more processors 620 to execute instructions to complete all or part of the steps of the foregoing method.
  • the processing component 602 may include one or more modules to facilitate the interaction between the processing component 602 and other components.
  • the processing component 602 may include a multimedia module to facilitate the interaction between the multimedia component 608 and the processing component 602.
  • the memory 604 is configured to store various types of data to support the operation of the device 600. Examples of these data include instructions for any application or method operating on the device 600, contact data, phone book data, messages, pictures, videos, etc.
  • the memory 604 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable and Programmable read only memory (EPROM), programmable read only memory (PROM), read only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read-only memory
  • EPROM erasable and Programmable read only memory
  • PROM programmable read only memory
  • ROM read only memory
  • magnetic memory flash memory
  • flash memory magnetic disk or optical disk.
  • the power supply component 606 provides power to various components of the device 600.
  • the power supply component 606 may include a power management system, one or more power supplies, and other components associated with the generation, management, and distribution of power for the device 600.
  • the multimedia component 608 includes a screen that provides an output interface between the device 600 and the user.
  • the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from the user.
  • the touch panel includes one or more touch sensors to sense touch, sliding, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure related to the touch or slide operation.
  • the multimedia component 608 includes a front camera and/or a rear camera. When the device 600 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data. Each front camera and rear camera can be a fixed optical lens system or have focal length and optical zoom capabilities.
  • the audio component 610 is configured to output and/or input audio signals.
  • the audio component 810 includes a microphone (MIC), and when the device 600 is in an operation mode, such as a call mode, a recording mode, and a voice recognition mode, the microphone is configured to receive an external audio signal.
  • the received audio signal can be further stored in the memory 604 or sent via the communication component 616.
  • the audio component 610 further includes a speaker for outputting audio signals.
  • the I/O interface 612 provides an interface between the processing component 602 and a peripheral interface module.
  • the above-mentioned peripheral interface module may be a keyboard, a click wheel, a button, and the like. These buttons may include, but are not limited to: home button, volume button, start button, and lock button.
  • the sensor component 614 includes one or more sensors for providing the device 600 with various aspects of status assessment.
  • the sensor component 614 can detect the on/off status of the device 600 and the relative positioning of components.
  • the component is the display and the keypad of the device 600.
  • the sensor component 614 can also detect the position change of the device 600 or a component of the device 600. , The presence or absence of contact between the user and the device 600, the orientation or acceleration/deceleration of the device 600, and the temperature change of the device 600.
  • the sensor component 614 may include a proximity sensor configured to detect the presence of nearby objects when there is no physical contact.
  • the sensor component 614 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications.
  • the sensor component 614 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
  • the communication component 616 is configured to facilitate wired or wireless communication between the apparatus 600 and other devices.
  • the device 600 can access a wireless network based on a communication standard, such as WiFi, 2G, or 3G, or a combination thereof.
  • the communication component 616 receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel.
  • the communication component 616 further includes a near field communication (NFC) module to facilitate short-range communication.
  • the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
  • RFID radio frequency identification
  • IrDA infrared data association
  • UWB ultra-wideband
  • Bluetooth Bluetooth
  • the apparatus 600 may be implemented by one or more application specific integrated circuits (ASIC), digital signal processors (DSP), digital signal processing equipment (DSPD), programmable logic devices (PLD), field programmable Implemented by a gate array (FPGA), controller, microcontroller, microprocessor, or other electronic components, used to perform the following methods:
  • ASIC application specific integrated circuits
  • DSP digital signal processors
  • DSPD digital signal processing equipment
  • PLD programmable logic devices
  • FPGA field programmable Implemented by a gate array
  • controller microcontroller, microprocessor, or other electronic components, used to perform the following methods:
  • the method further includes:
  • the acquiring location information of the interference source includes:
  • the user corresponding to the collection direction other than the target sound source direction is determined as an interfering user, and the interfering user is acquired
  • the location information is used as the location information of the interference source.
  • the method further includes:
  • the method further includes:
  • the directional suppression collection of the direction of the interference source includes:
  • the direction of the interference source is subjected to directional suppression collection according to the interference reverberation information.
  • the method further includes:
  • the determining the collection direction corresponding to the user according to the location information of the user includes:
  • the first line is the visual sensor determined according to the position information of the visual sensor system and the position information of the microphone array
  • the second connection is a connection between the microphone array and the user determined according to the position information of the microphone array and the position information of the user
  • the method further includes:
  • the control When the no user activity signal detected by the visual sensor system is acquired, the control enters the standby state.
  • non-transitory computer-readable storage medium including instructions, such as the memory 604 including instructions, which may be executed by the processor 620 of the device 600 to complete the foregoing method.
  • the non-transitory computer-readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
  • a non-transitory computer-readable storage medium When instructions in the storage medium are executed by a processor of a mobile terminal, the mobile terminal can execute a sound collection method, the method comprising:
  • the method further includes:
  • the acquiring location information of the interference source includes:
  • the user corresponding to the collection direction other than the target sound source direction is determined as an interfering user, and the interfering user is acquired
  • the location information is used as the location information of the interference source.
  • the method further includes:
  • the method further includes:
  • the directional suppression collection of the direction of the interference source includes:
  • the direction of the interference source is subjected to directional suppression collection according to the interference reverberation information.
  • the method further includes:
  • the determining the collection direction corresponding to the user according to the location information of the user includes:
  • the first line is the visual sensor determined according to the position information of the visual sensor system and the position information of the microphone array
  • the second connection is a connection between the microphone array and the user determined according to the position information of the microphone array and the position information of the user
  • the method further includes:
  • the control When the no user activity signal detected by the visual sensor system is acquired, the control enters the standby state.
  • the microphone array first obtains the user's location information collected in real time from the visual sensing system, so as to determine the user's corresponding collection direction according to the user's location information. That is, the possible sound source direction is first determined according to the user's position information collected by the visual sensor system. Then carry out directional sound collection in the collection direction corresponding to the user. If the target sound signal is received in the collection direction corresponding to the user, the collection direction of the received target sound signal is determined as the target sound source direction, and then the sound collection is performed on the target sound source direction , So as to obtain the required sound signal.
  • the embodiment of the present application can determine multiple possible collection directions and determine the final target sound source direction with the assistance of the visual sensing system, so as to perform sound collection according to the known sound source direction. It avoids scanning and collecting in all directions in space, and improves the accuracy and efficiency of collecting.
  • the visual sensor system can collect the user's location information in real time, so that the microphone array can obtain the user's real-time location information, and then can determine the user's corresponding collection direction in real time, avoiding the problem of inaccurate directional radio reception due to user movement.
  • Fig. 7 is a schematic structural diagram of a server in an embodiment of the present invention.
  • the server 700 may have relatively large differences due to different configurations or performances, and may include one or more central processing units (CPU) 722 (for example, one or more processors) and a memory 732, one or one
  • the above storage medium 730 (for example, one or one storage device with a large amount of storage) for storing the application program 742 or the data 744.
  • the memory 732 and the storage medium 730 may be short-term storage or persistent storage.
  • the program stored in the storage medium 730 may include one or more modules (not shown in the figure), and each module may include a series of command operations on the server.
  • the central processing unit 722 may be configured to communicate with the storage medium 730, and execute a series of instruction operations in the storage medium 730 on the server 700.
  • the terminal 700 may also include one or more power supplies 726, one or more wired or wireless network interfaces 750, one or more input and output interfaces 758, one or more keyboards 756, and/or, one or more operating systems 741 , Such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM and so on.
  • At least one (item) refers to one or more, and “multiple” refers to two or more.
  • “And/or” is used to describe the association relationship of associated objects, indicating that there can be three types of relationships, for example, “A and/or B” can mean: only A, only B, and both A and B , Where A and B can be singular or plural.
  • the character “/” generally indicates that the associated objects before and after are in an “or” relationship.
  • the following at least one item (a) or similar expressions refers to any combination of these items, including any combination of a single item (a) or a plurality of items (a).
  • At least one of a, b, or c can mean: a, b, c, "a and b", “a and c", “b and c", or "a and b and c" ", where a, b, and c can be single or multiple.
  • the steps of the method or algorithm described in combination with the embodiments disclosed in this document can be directly implemented by hardware, a software module executed by a processor, or a combination of the two.
  • the software module can be placed in random access memory (RAM), internal memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disks, removable disks, CD-ROMs, or all areas in the technical field. Any other known storage media.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Otolaryngology (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)

Abstract

本申请实施例公开了一种声音采集方法及装置,具体为,麦克风阵列首先从视觉传感系统获取实时采集的用户的位置信息,以根据用户的位置信息确定用户对应的采集方向。再对用户对应的采集方向进行定向收音,如果在用户对应的采集方向接收到目标声音信号,则将接收到目标声音信号的采集方向确定为目标声源方向,进而对目标声源方向进行声音采集,从而获得所需的声音信号。即,本申请实施例通过视觉传感系统的辅助可以确定出多个可能的采集方向并确定出最终的目标声源方向,以根据已知的声源方向进行声音采集,避免了对空间全方位的扫描采集,提高了采集的准确性以及效率。

Description

一种声音采集方法及装置
本申请要求于2019年08月29日提交中国国家知识产权局、申请号为2019108090704、发明名称为“一种声音采集方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据处理技术领域,具体涉及一种声音采集方法及装置。
背景技术
麦克风阵列一般由一定数目的声学传感器组成,用来对声场的空间特性进行采样并处理。麦克风阵列在人机交互领域具有重要意义,可以极大扩展交互距离,使得用户无需手持或者贴近收音设备即可进行自然的语音交互,已经在智能家居等场景中得到广泛的应用。
传统的麦克风阵列在工作过程中,需要对整个空间进行扫描以采集声音信号。但是,在实际应用场景中,麦克风阵列的使用环境复杂,可能无法准确采集到目标声源发出的声音,造成麦克风阵列无法达到预期的使用效果。
发明内容
有鉴于此,本申请实施例提供一种声音采集方法及装置,以解决现有技术中麦克风阵列可能无法准确采集到目标声源的声音的技术问题。
为解决上述问题,本申请实施例提供的技术方案如下:
在本申请实施例第一方面,提供了一种声音采集方法,该方法应用于麦克风阵列,所述方法包括:
获取视觉传感系统实时采集的用户的位置信息;
根据所述用户的位置信息确定所述用户对应的采集方向;
对所述用户对应的采集方向进行定向收音;
当接收到目标声音信号时,将接收到所述目标声音信号的采集方向确定为目标声源方向;
对所述目标声源方向进行声音采集,获得采集的声音信号。
在一种可能的实现方式中,所述方法还包括:
获取干扰源的位置信息;
根据所述干扰源的位置信息确定所述干扰源的方向;
在对所述目标声源方向进行声音采集的过程中,对所述干扰源的方向进行定向抑制采集。
在一种可能的实现方式中,所述获取干扰源的位置信息,包括:
获取预先标记的固定干扰源的位置信息作为干扰源的位置信息;
和/或,将接收到所述目标声音信号的采集方向确定为目标声源方向后,将排除所述目标声源方向之外的其他采集方向对应的用户确定为干扰用户,获取所述干扰用户的位置信息作为干扰源的位置信息。
在一种可能的实现方式中,所述方法还包括:
根据目标用户的位置信息、空间的尺寸信息以及所述麦克风阵列的位置信息计算房间冲激响应,所述目标用户为所述目标声源方向对应的用户;
将所述房间冲激响应作为消除混响算法的初始参数,对所述采集的声音信号根据所述消除混响算法进行消除混响操作。
在一种可能的实现方式中,所述方法还包括:
根据干扰源的位置信息、空间的尺寸信息以及所述麦克风阵列的位置信息计算干扰混响信息;
所述对所述干扰源的方向进行定向抑制采集,包括:
根据所述干扰混响信息对所述干扰源的方向进行定向抑制采集。
在一种可能的实现方式中,所述方法还包括:
接收所述视觉传感系统发送的指定频率声音信号;
计算所述麦克风阵列的零度朝向与所述接收所述指定频率声音信号的方向之间的第一角度差。
在一种可能的实现方式中,所述根据所述用户的位置信息确定所述用户对应的采集方向,包括:
计算第一连线与第二连线之间的第二角度差;所述第一连线为根据所述视觉传感系统的位置信息与所述麦克风阵列的位置信息确定的所述视觉传感系统与所述麦克风阵列之间的连线,所述第二连线为根据所述麦克风阵列的位置 信息与所述用户的位置信息确定的所述麦克风阵列与所述用户之间的连线;
根据所述第一角度差以及所述第二角度差确定所述麦克风阵列的零度朝向与所述第二连线之间的第三角度差,将所述第三角度差作为所述用户对应的采集方向。
在一种可能的实现方式中,所述方法还包括:
当获取到所述视觉传感系统检测到的无用户活动信号,控制进入待机状态。
在本申请实施例第二方面,提供了一种声音采集装置,所述装置应用于麦克风阵列,所述装置包括:
第一获取单元,用于获取视觉传感系统实时采集的用户的位置信息;
第一确定单元,用于根据所述用户的位置信息确定所述用户对应的采集方向;
收音单元,用于对所述用户对应的采集方向进行定向收音;
第二确定单元,用于当接收到目标声音信号时,将接收到所述目标声音信号的采集方向确定为目标声源方向;
第一采集单元,用于对所述目标声源方向进行声音采集,获得采集的声音信号。
在一种可能的实现方式中,所述装置还包括:
第二获取单元,用于获取干扰源的位置信息;
第三确定单元,用于根据所述干扰源的位置信息确定所述干扰源的方向;
第二采集单元,用于在对所述目标声源方向进行声音采集的过程中,对所述干扰源的方向进行定向抑制采集。
在一种可能的实现方式中,所述第二获取单元,具体用于获取预先标记的固定干扰源的位置信息作为干扰源的位置信息;和/或,将接收到所述目标声音信号的采集方向确定为目标声源方向后,将排除所述目标声源方向之外的其他采集方向对应的用户确定为干扰用户,获取所述干扰用户的位置信息作为干扰源的位置信息。
在一种可能的实现方式中,所述装置还包括:
第一计算单元,用于根据目标用户的位置信息、空间的尺寸信息以及所述 麦克风阵列的位置信息计算房间冲激响应,所述目标用户为所述目标声源方向对应的用户;
消除单元,用于将所述房间冲激响应作为消除混响算法的初始参数,对所述采集的声音信号根据所述消除混响算法进行消除混响操作。
在一种可能的实现方式中,所述装置还包括:
第二计算单元,用于根据干扰源的位置信息、空间的尺寸信息以及所述麦克风阵列的位置信息计算干扰混响信息;
所述第二采集单元,具体用于根据所述干扰混响信息对所述干扰源的方向进行定向抑制采集。
在一种可能的实现方式中,所述装置还包括:
接收单元,用于接收所述视觉传感系统发送的指定频率声音信号;
第三计算单元,用于计算所述麦克风阵列的零度朝向与所述接收所述指定频率声音信号的方向之间的第一角度差。
在一种可能的实现方式中,所述第一确定单元,包括:
计算子单元,用于计算第一连线与第二连线之间的第二角度差;所述第一连线为根据所述视觉传感系统的位置信息与所述麦克风阵列的位置信息确定的所述视觉传感系统与所述麦克风阵列之间的连线,所述第二连线为根据所述麦克风阵列的位置信息与所述用户的位置信息确定的所述麦克风阵列与所述用户之间的连线;
确定子单元,用于根据所述第一角度差以及所述第二角度差确定所述麦克风阵列的零度朝向与所述第二连线之间的第三角度差,将所述第三角度差作为所述用户对应的采集方向。
在一种可能的实现方式中,所述装置还包括:
控制单元,用于当获取到所述视觉传感系统检测到的无用户活动信号,控制进入待机状态。
在本申请实施例第三方面,提供了一种用于声音采集的装置,包括有存储器,以及一个或者一个以上的程序,其中一个或者一个以上程序存储于存储器中,且经配置以由一个或者一个以上处理器执行所述一个或者一个以上程序包含用于进行以下操作的指令:
获取视觉传感系统实时采集的用户的位置信息;
根据所述用户的位置信息确定所述用户对应的采集方向;
对所述用户对应的采集方向进行定向收音;
当接收到目标声音信号时,将接收到所述目标声音信号的采集方向确定为目标声源方向;
对所述目标声源方向进行声音采集,获得采集的声音信号。
在本申请实施例第四方面,提供了一种计算机可读介质,其上存储有指令,当由一个或多个处理器执行时,使得装置执行第一方面所述的声音采集的方法。
由此可见,本申请实施例具有如下有益效果:
本申请实施例中麦克风阵列首先从视觉传感系统获取实时采集的用户的位置信息,以根据用户的位置信息确定用户对应的采集方向。即,根据视觉传感系统采集的用户位置信息先确定可能的声源方向。再对用户对应的采集方向进行定向收音,如果在用户对应的采集方向接收到目标声音信号,则将接收到目标声音信号的采集方向确定为目标声源方向,进而对目标声源方向进行声音采集,从而获得所需的声音信号。即,本申请实施例通过视觉传感系统的辅助可以确定出多个可能的采集方向并确定出最终的目标声源方向,以根据已知的声源方向进行声音采集。避免了对空间全方位的扫描采集,提高了采集的准确性以及效率。另外,视觉传感系统可以实时采集用户的位置信息,以便麦克风阵列可以获取用户的实时位置信息,进而可以实时确定用户对应的采集方向,避免因用户移动导致定向收音不准确的问题。
附图说明
图1为本申请实施例提供的一种应用场景示意图;
图2为本申请实施例提供的一种声音采集方法的流程图;
图3为本申请实施例提供的一种抑制干扰源方法的流程图;
图4为本申请实施例提供的一种确定用户采集方向的示例图;
图5为本申请实施例提供的一种声音采集装置的结构图;
图6为本申请实施例提供的另一种声音采集装置的结构图;
图7为本申请实施例提供的一种服务器结构图。
具体实施方式
为使本申请的上述目的、特征和优点能够更加明显易懂,下面结合附图和具体实施方式对本申请实施例作进一步详细的说明。
发明人在对传统的麦克风阵列采集声音方法研究中发现,传统的声音采集方法主要利用麦克风阵列在整个空间内进行全盲扫描,进而根据声源定位方法估计目标声源。然而,在实际应用环境中,由于使用环境复杂,导致难以准确估计目标声源,进而无法准确获取目标声源的声音信号。
基于此,本申请实施例提供了一种声音采集方法,具体为,麦克风阵列在采集声音信号之前,首先从视觉传感系统获取实时采集的用户的位置信息,进而根据用户的位置确定用户对应的采集方向。也就是,麦克风阵列在采集声音信号前,先根据用户的位置信息确定出可能声源的采集方向。然后,在可能的采集方向上进行定向收音,如果在可能的采集方向采集到目标声音信号,则将采集到目标声音信号的采集方向确定为目标声源方向,该采集方向对应的用户为目标用户。最后,在目标声源方向上进行声音采集,获得目标用户的声音信号。即,在视觉传感系统的辅助下,麦克风阵列可以先在可能存在目标声源的采集方向上收音,进而根据收音结果确定出目标声源方向,从而可以在确定的目标声源方向上采集声音信号,无需进行全方位扫描,提高目标声源声音信号的采集准确性。
为便于理解本申请实施例提供的参见图1,该图为本申请实施例提供的示例性应用场景的框架示意图。其中,本申请实施例提供的声音采集方法可以应用于麦克风阵列10中。在实际应用时,视觉传感系统20可以安装在一个空间内,例如房间,具体安装位置可以根据实际情况确定,以确保其可以监控整个空间。
在具体实现时,视觉传感系统20可以实时采集空间内每个用户(例如,用户1和用户2)的位置信息。麦克风阵列10从视觉传感系统20中获取该空间内每个用户的位置信息,以确定每个用户各自对应的采集方向。然后,麦克风阵列10在每个采集方向上进行定向收音,以获得每个用户的声音信号。如果定向收音中出现目标声音信号,则将接收到的目标声音信号的采集方向确定为目标声 源方向,以从目标声源方向进行声音采集,获得目标用户的声音信号。例如,麦克风阵列10分别接收用户1的声音信号、用户2的声音信号,当用户1的声音信号为目标声音信号时,则将用户1对应的采集方向为目标声源方向,用户1为目标用户,进而麦克风阵列对用户1的采集方向进行声音采集,获得目标用户的声音信号。
基于上述说明,在实际应用中,本实施例中的视觉传感系统可以包括红外摄像设备、彩色摄像设备、高频发声单元以及传输单元。视觉传感系统的作用为定位和追踪室内发声设备和人员等的位置,并将其传输给麦克风阵列。具体的,外摄像设备和/或彩色摄像设备可以用于实时采集的用户的位置信息,高频发声单元可以用于指定频率声音信号,传输单元可以用于将采集的用户的位置信息发送给麦克风阵列。麦克风阵列可以包含多个麦克风及采集板、扬声器以及信号处理单元。麦克风阵列的作用为根据视觉辅助设备传输的位置信息进行阵列信号处理,进行远场拾音,并通过自身的扬声器和用户实现远场语音交互。
在实际应用中,麦克风阵列可以通过蓝牙等无线方式与视觉传感系统直接通信,也可以通过路由器或网络传输协议等方式与视觉传感系统进行中继通信,本实施例在此不做限定。
本领域技术人员可以理解,图1所示的框架示意图仅是本申请的实施方式可以在其中得以实现的一个示例。本申请实施方式的适用范围不受到该框架任何方面的限制。
为便于理解本申请技术方案的具体实现,下面将结合附图对本申请提供的声音采集方法进行说明。
参见图2,该图为本申请实施例提供的一种声音采集方法的流程图,该方法应用于麦克风阵列,如图2所示,该方法可以包括:
S201:获取视觉传感系统实时采集的用户的位置信息。
本实施例中,视觉传感系统可以实时采集空间内每个用户的位置信息。麦克风阵列可以从视觉传感系统获取每个用户的位置信息,从而可以获知可能的声源位置。其中,用户的位置信息可以为空间坐标系下的位置信息,该位置信息即为用户在空间内的位置坐标。
可以理解的是,位于空间的用户可能会发生位置移动,为保证麦克风阵列可以获取用户最新的位置信息,视觉传感系统将实时采集用户的位置信息,进而使得麦克风阵列可以获取最新的位置信息,以保证麦克风阵列在执行S202时,可以确定用户对应的最新的采集方向。
S202:根据用户的位置信息确定用户对应的采集方向。
麦克风阵列在获取空间内每个用户的位置信息后,可以根据自身的位置信息以及用户的位置信息确定用户对应的采集方向。在具体实现时,由于麦克风阵列在空间内的位置坐标已知,在获取用户的位置坐标后,通过两个位置坐标,可以计算用户相对于麦克风阵列的方向,即用户对应的采集方向。
即本实施例中,视觉传感系统先获取当前空间存在的用户的位置信息,以便麦克风阵列可以预先获取该空间内可能为声源的用户位置信息,进而麦克风阵列通过S202可以确定出可能声源对应的采集方向,无需在空间内进行全方位扫描以估计声源位置。
S203:对用户对应的采集方向进行定向收音。
本实施例中,当麦克风阵列确定出每个用户对应的采集方向,对每个用户对应的采集方向进行定向收音,以获取每个用户的声音信号。在实际应用时,麦克风阵列在对用户对应的采集方向进行定向收音的同时,也可以抑制其他方向的声音干扰,以提高后续确定声源方向的准确性。
在具体实现时,可以采用波束形成方法进行定向收音,具体为通过麦克风阵列获取声音信号的空间谱特性,再对声音信号进行空域滤波从而实现定向收音。
S204:当接收到目标声音信号时,将接收到目标声音信号的采集方向确定为目标声源方向。
本实施例中,当麦克风阵列获得每个采集方向上的声音信号时,如果接收到的声音信号中存在目标声音信号时,将接收到的目标声音信号的采集方向确定为目标声源方向。其中,目标声音信号可以为该声音信号中存在特定的唤醒词和/或该声音信号的声纹特征符合预设的声纹特征。
在具体实现时,可以在麦克风阵列中预先存储设定的唤醒词,当从用户对应的采集方向进行定向收音时,判断所接收到的声音信号中是否出现预设的唤 醒词。如果存在,则将该声音信号确定为目标声音信号,并将该目标声音信号对应的采集方向确定为目标声源方向,该目标声音信号对应的用户为目标用户。
和/或,在麦克风阵列中预先存储目标用户的声纹特征,当从用户对应的采集方向进行定向收音时,判断所接收到的声音信号的声纹特征是否与预先的声纹特征相同。如果相同,则将该声音信号确定为目标声音信号,并将该目标声音信号对应的采集方向确定为目标声源方向,该目标声音信号对应的用户为目标用户。
S205:对目标声源方向进行声音采集,获得采集的声音信号。
当确定出目标声源方向时,麦克风阵列可以采集目标声源方向的声音信号,从而获得目标声源的声音信号,进而可以进行声音识别等操作。
可以理解的是,在实际应用环境中,声音信号在空间内传播时,遇到障碍物被反射产生混响,影响听觉效果。基于此,为解除声音混响,本实施提供还了一种解混响方法,具体可以包括:
1)根据目标用户的位置信息、空间的尺寸信息以及麦克风阵列的位置信息计算房间冲激响应。
本实施例中,通过视觉传感系统可以获得目标用户的位置信息,然后根据目标用户的位置信息、空间的尺寸信息以及麦克风阵列的位置信息计算得到房间冲激响应。其中,目标用户为目标声源方向对应的用户。在具体实现时,可以利用IMAGE方法估计房间冲激响应。
2)将房间冲激响应作为消除混响算法的初始参数,对采集的声音信号根据消除混响算法进行消除混响操作。
当获得房间冲激响应后,将其作为消除混响算法的初始参数,以提高消除混响算法的性能。再利用该消除混响算法对采集的目标用户的声音信号进行消除混响操作,获得去混响的声音信号,从而避免混响对用户的听觉影响。即,针对混响造成识别效果下降的问题,本实施例在得到目标声源位置信息的基础上,结合空间尺寸以及麦克风阵列位置,可以得到较为准确的解混响滤波器的初始参数,从而得到更好的解混响效果。
通过上述描述,本申请实施例中麦克风阵列首先从视觉传感系统获取实时采集的用户的位置信息,以根据用户的位置信息确定用户对应的采集方向。即,根据视觉传感系统采集的用户位置信息先确定可能声源的方向。再对用户对应的采集方向进行定向收音,如果在用户对应的采集方向接收到目标声音信号,则将接收到目标声音信号的采集方向确定为目标声源方向,进而对目标声源方向进行声音采集,从而获得所需的声音信号。本申请实施例通过视觉传感系统的辅助可以确定出多个可能的采集方向并确定出最终的目标声源方向,以根据已知的声源方向进行声音采集。避免了对空间全方位的扫描采集,提高了采集的准确性以及效率。另外,视觉传感系统可以实时采集用户的位置信息,以便麦克风阵列可以获取用户的实时位置信息,进而可以实时确定用户对应的采集方向,避免因用户移动导致定向收音不准确的问题。
可以理解的是,在复杂的应用场景中,可能存在干扰源影响麦克风阵列采集声源的声音信号。为减少麦克风阵列所采集的声音信号中的干扰信号,麦克风阵列可以在采集目标声源方向上的声音信号时,抑制干扰源方向上的声音信号。
基于此,本申请实施例还提供了一种抑制干扰源方法,下面将结合附图对该方法进行说明。参加图3,该图为本申请实施例提供的一种抑制干扰源方法的流程图,该方法可以包括:
S301:获取干扰源的位置信息。
S302:根据干扰源的位置信息确定干扰源的方向。
本实施例中,麦克风阵列首先获取空间内每个干扰源的位置信息,以根据干扰源的位置信息确定干扰源的方向,即确定干扰源相对于麦克风阵列的方向。
其中,干扰源可以为空间内固定发声设备,例如电视机、音响、空调等,也可以为空间内除目标用户外其他用户。当干扰源为固定发声设备时,麦克风在获取干扰源的位置信息时,可以为获取预先标记的固定干扰源的位置信息作为干扰源位置信息。即,当干扰源为固定发声设备时,由于其在空间内位置通常固定不变,因此,可以预先标记固定干扰源在空间内的位置信息,从而使得麦克风阵列可以直接获取固定干扰源的位置信息。
当干扰源为空间内除目标用户外其他用户时,麦克风阵列在获取干扰源的位置信息时,可以为将接收到目标声音信号的采集方向确定为目标声源方向后,将排除目标声源方向之外的其他采集方向对应的用户确定为干扰用户,将干扰用户的位置信息作为干扰源的位置信息。即,在麦克风阵列获取空间内每个用户对应的采集方向后执行S203时,将接收到目标声音信号的采集方向对应的用户确定为目标用户,其他采集方向对应的用户确定干扰用户,该干扰用户的位置信息即为干扰源的位置信息。
S303:在对目标声源方向进行声音采集的过程中,对干扰源的方向进行定向抑制采集。
当确定干扰源的方向后,麦克风阵列在采集目标声源方向的声音信号的同时,对干扰源方向进行定向抑制采集,以减少干扰声音信号的采集。在具体实现时,麦克风阵列可以采用复杂度低且抑制力强的固定零陷波束形成方法在目标声源方向形成波束采集声音信号,在干扰源方向通过零陷位置进行抑制。
可以理解的是,干扰源的声音信号在空间传播时,也产生混响,基于此,本实施例提供了一种计算干扰源混响信息的实现方式。具体为,根据干扰源的位置信息、空间的尺寸信息以及麦克风阵列的位置信息计算干扰源混响信息;则对干扰源的方向进行定向采集抑制,包括:根据干扰混响信息对干扰源的方向进行定向采集抑制。即,麦克风阵列可以根据干扰源的位置信息、空间的尺寸信息以及自身的位置信息计算干扰源在该空间的产生的干扰混响信息。在对干扰源的方向进行定向采集抑制时,根据干扰混响信息进行定向采集抑制。
在具体实现时,可以根据广义旁瓣相消(Generalized Sidelobe Canceller,GSC)方法以及干扰混响信息对干扰源的方向进行定向采集抑制。具体为,将干扰混响信息作为该方法中自适应滤波器的参考初值,通过加快收敛速度,增强麦克风阵列的干扰抑制能力。
通过上述描述可知,麦克风阵列可以获取干扰源的位置信息以准确地确定所有干扰源的方向,进而在采集目标声源方向上的声音信号时,抑制干扰源方向的干扰,从而实现稳定高效的拾音和抑制效果。此外,本申请在得到干扰源准确位置信息的基础上,结合空间的尺寸信息以及麦克风阵列的位置信息获得 较为准确的干扰混响信息,并将其用于干扰抑制的滤波器以进一步抑制干扰,提高麦克风阵列输出的信噪比。
需要说明的是,麦克风阵列在使用之前,还可以根据视觉传感系统发出的校准声音,对自身的阵列朝向进行校准,以获得视觉传感系统相对于麦克风阵列的方向。具体为,接收视觉传感系统发送的指定频率声音信号;计算麦克风阵列的零度朝向与接收指定频率声音信号的方向之间的第一角度差。其中,麦克风阵列的零度朝向为麦克风阵列自身定义的零度朝向,其在进行定向收音时,是基于零度朝向来确定采集方向。
即,麦克风阵列可以通过对指定频率声音信号测向,获得发出指定频率声音信号的视觉传感系统相对于麦克风阵列的零度朝向的方向,即确定视觉传感系统与麦克风阵列之间的连线与零度朝向的角度,如图4所示。
在具体实现时,麦克风阵列在接收到指定频率声音信号时可以根据波达方向(Direction Of Arrival,DOA)估计算法确定视觉传感系统相对于零度朝向的第一角度差。
基于上述描述,由于麦克风阵列在定向收音时,是基于零度朝向进行定向收音,因此麦克风阵列在根据用户的位置信息确定用户对应的采集方向时,该采集方向应为用户相对于麦克风阵列零度朝向的方向,从而可以准确采集目标声源的声音信号。基于此,本实施例采用了一种确定用户对应的采集方向的实现方式,具体为:
1)计算第一连线与第二连线之间的第二角度差。
本实施例中,麦克风阵列可以根据视觉传感系统的位置信息与麦克风阵列的位置确定视觉传感系统与麦克风阵列之间的连线,即第一连线。再根据麦克风阵列的位置信息与用户的位置信息确定麦克风阵列与用户之间的连线,即第二连线,并计算两个连线之间的夹角,即第二角度差。
在具体实现时,由于麦克风阵列位置信息、视觉传感系统位置信息以及用户的位置信息已知,可以利用三角函数计算第一连线与第二连线之间的角度差,从而获得第二角度差。如图4所示,麦克风阵列、视觉传感系统与用户构成三角形,根据三者的位置信息可以计算获得三角形每条边的长度,进而利用三角函数获得第二角度差。
2)根据第一角度差和第二角度差确定麦克阵列的零度朝向与第二连线之间的第三角度差,将第三角度差作为用户对应的采集方向。
本实施例中,麦克风阵列根据第一连线与零度朝向之间的第一角度差以及第一连线与第二连线之间的角度差,确定用户相对于零度朝向的方向夹角。即零度朝向与第二连线之间的第三角度差,将第三角度差作为用户对应的采集方向。将第一角度差与第二角度差相加获得第三角度差,从而麦克风阵列可以获知在零度朝向的多少偏角进行收音。
在一种可能的实现方式中,为降低麦克风阵列的功耗以及提高使用寿命,麦克风阵列还可以根据视觉传感系统发送的信息控制自身处于待机状态。具体为,当获取到视觉传感系统检测到的无用户活动信号时,控制进入待机状态。
由于视觉传感系统可以实时采集空间内用户的位置信息,因此,其可以监测空间内是否有人员活动,如果监测到无人员活动时,告知麦克风阵列当前空间内无用户活动,以使得麦克风阵列处于待机状态,不进行信号处理和应答响应。当麦克风阵列获取到视觉传感系统检测到有用户活动信号时,麦克风阵列进入待唤醒状态,并获取用户的位置信息,以便在可能的方向上进行定向收音以及后续操作。
在实际应用中,为提高用户体验,还可以在麦克风阵列上安装LED指向灯,当确定目标声源后,指向目标声源方向的LED高亮,以使得用户可以直观地了解麦克风阵列在采集其声音信号。另外,还可以在麦克风阵列上安装全角度摄像系统,以辅助对目标声源的定位和追踪,从实时采集目标声源的声音信号。
此外,当干扰源与目标声源的角度间距较小或者在同一方向时,为实现稳定高效的拾音和抑制效果,可以部署多个麦克风阵列形成分布式麦克风阵列系统,共同接收视觉传感系统发送的用户的位置信息,进而可以增加确定目标声源的精度,实现远场拾音和干扰抑制。
基于上述方法实施例,本申请提供了一种声音采集装置,下面将结合附图对该装置进行说明。
参见图5,该图为本申请实施例提供的一种声音采集装置结构图,该装置应用与麦克风阵列,如图5所示,该装置可以包括:
第一获取单元501,用于获取视觉传感系统实时采集的用户的位置信息;
第一确定单元502,用于根据所述用户的位置信息确定所述用户对应的采集方向;
收音单元503,用于对所述用户对应的采集方向进行定向收音;
第二确定单元504,用于当接收到目标声音信号时,将接收到所述目标声音信号的采集方向确定为目标声源方向;
第一采集单元505,用于对所述目标声源方向进行声音采集,获得采集的声音信号。
在一种可能的实现方式中,所述装置还包括:
第二获取单元,用于获取干扰源的位置信息;
第三确定单元,用于根据所述干扰源的位置信息确定所述干扰源的方向;
第二采集单元,用于在对所述目标声源方向进行声音采集的过程中,对所述干扰源的方向进行定向抑制采集。
在一种可能的实现方式中,所述第二获取单元,具体用于获取预先标记的固定干扰源的位置信息作为干扰源的位置信息;和/或,将接收到所述目标声音信号的采集方向确定为目标声源方向后,将排除所述目标声源方向之外的其他采集方向对应的用户确定为干扰用户,获取所述干扰用户的位置信息作为干扰源的位置信息。
在一种可能的实现方式中,所述装置还包括:
第一计算单元,用于根据目标用户的位置信息、空间的尺寸信息以及所述麦克风阵列的位置信息计算房间冲激响应,所述目标用户为所述目标声源方向对应的用户;
消除单元,用于将所述房间冲激响应作为消除混响算法的初始参数,对所述采集的声音信号根据所述消除混响算法进行消除混响操作。
在一种可能的实现方式中,所述装置还包括:
第二计算单元,用于根据干扰源的位置信息、空间的尺寸信息以及所述麦克风阵列的位置信息计算干扰混响信息;
所述第二采集单元,具体用于根据所述干扰混响信息对所述干扰源的方向进行定向抑制采集。
在一种可能的实现方式中,所述装置还包括:
接收单元,用于接收所述视觉传感系统发送的指定频率声音信号;
第三计算单元,用于计算所述麦克风阵列的零度朝向与所述接收所述指定频率声音信号的方向之间的第一角度差。
在一种可能的实现方式中,所述第一确定单元,包括:
计算子单元,用于计算第一连线与第二连线之间的第二角度差;所述第一连线为根据所述视觉传感系统的位置信息与所述麦克风阵列的位置信息确定的所述视觉传感系统与所述麦克风阵列之间的连线,所述第二连线为根据所述麦克风阵列的位置信息与所述用户的位置信息确定的所述麦克风阵列与所述用户之间的连线;
确定子单元,用于根据所述第一角度差以及所述第二角度差确定所述麦克风阵列的零度朝向与所述第二连线之间的第三角度差,将所述第三角度差作为所述用户对应的采集方向。
在一种可能的实现方式中,所述装置还包括:
控制单元,用于当获取到所述视觉传感系统检测到的无用户活动信号,控制进入待机状态。
需要说明的是,本实施例中各个单元的实现可以参见上述方法实施例,本实施例在此不再赘述。
本申请实施例中麦克风阵列首先从视觉传感系统获取实时采集的用户的位置信息,以根据用户的位置信息确定用户对应的采集方向。即,根据视觉传感系统采集的用户位置信息先确定可能的声源方向。再对用户对应的采集方向进行定向收音,如果在用户对应的采集方向接收到目标声音信号,则将接收到目标声音信号的采集方向确定为目标声源方向,进而对目标声源方向进行声音采集,从而获得所需的声音信号。即,本申请实施例通过视觉传感系统的辅助可以确定出多个可能的采集方向并确定出最终的目标声源方向,以根据已知的声源方向进行声音采集。避免了对空间全方位的扫描采集,提高了采集的准确性以及效率。另外,视觉传感系统可以实时采集用户的位置信息,以便麦克风阵列可以获取用户的实时位置信息,进而可以实时确定用户对应的采集方向,避免因用户移动导致定向收音不准确的问题。
图6示出了一种实现声音采集的装置600的框图。例如,装置600可以是移动电话,计算机,数字广播终端,消息收发设备,游戏控制台,平板设备,医疗设备,健身设备,个人数字助理等。
参照图6,装置600可以包括以下一个或多个组件:处理组件602,存储器604,电源组件606,多媒体组件608,音频组件610,输入/输出(I/O)的接口612,传感器组件614,以及通信组件616。
处理组件602通常控制装置600的整体操作,诸如与显示,电话呼叫,数据通信,相机操作和记录操作相关联的操作。处理元件602可以包括一个或多个处理器620来执行指令,以完成上述的方法的全部或部分步骤。此外,处理组件602可以包括一个或多个模块,便于处理组件602和其他组件之间的交互。例如,处理部件602可以包括多媒体模块,以方便多媒体组件608和处理组件602之间的交互。
存储器604被配置为存储各种类型的数据以支持在设备600的操作。这些数据的示例包括用于在装置600上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。存储器604可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。
电源组件606为装置600的各种组件提供电力。电源组件606可以包括电源管理系统,一个或多个电源,及其他与为装置600生成、管理和分配电力相关联的组件。
多媒体组件608包括在所述装置600和用户之间的提供一个输出接口的屏幕。在一些实施例中,屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包括触摸面板,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。所述触摸传感器可以不仅感测触摸或滑动动作的边界,而且还检测与所述触摸或滑动操作相关的持续时间和压力。在一些实施例中,多媒体组件608包括一个前置摄像头和/或后置摄像头。当设备600处于操作模式,如拍摄模式或 视频模式时,前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。
音频组件610被配置为输出和/或输入音频信号。例如,音频组件810包括一个麦克风(MIC),当装置600处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器604或经由通信组件616发送。在一些实施例中,音频组件610还包括一个扬声器,用于输出音频信号。
I/O接口612为处理组件602和外围接口模块之间提供接口,上述外围接口模块可以是键盘,点击轮,按钮等。这些按钮可包括但不限于:主页按钮、音量按钮、启动按钮和锁定按钮。
传感器组件614包括一个或多个传感器,用于为装置600提供各个方面的状态评估。例如,传感器组件614可以检测到设备600的打开/关闭状态,组件的相对定位,例如所述组件为装置600的显示器和小键盘,传感器组件614还可以检测装置600或装置600一个组件的位置改变,用户与装置600接触的存在或不存在,装置600方位或加速/减速和装置600的温度变化。传感器组件614可以包括接近传感器,被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件614还可以包括光传感器,如CMOS或CCD图像传感器,用于在成像应用中使用。在一些实施例中,该传感器组件614还可以包括加速度传感器,陀螺仪传感器,磁传感器,压力传感器或温度传感器。
通信组件616被配置为便于装置600和其他设备之间有线或无线方式的通信。装置600可以接入基于通信标准的无线网络,如WiFi,2G或3G,或它们的组合。在一个示例性实施例中,通信部件616经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,所述通信部件616还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术,红外数据协会(IrDA)技术,超宽带(UWB)技术,蓝牙(BT)技术和其他技术来实现。
在示例性实施例中,装置600可以被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程 逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行下述方法:
获取视觉传感系统实时采集的用户的位置信息;
根据所述用户的位置信息确定所述用户对应的采集方向;
对所述用户对应的采集方向进行定向收音;
当接收到目标声音信号时,将接收到所述目标声音信号的采集方向确定为目标声源方向;
对所述目标声源方向进行声音采集,获得采集的声音信号。
可选的,所述方法还包括:
获取干扰源的位置信息;
根据所述干扰源的位置信息确定所述干扰源的方向;
在对所述目标声源方向进行声音采集的过程中,对所述干扰源的方向进行定向抑制采集。
可选的,所述获取干扰源的位置信息,包括:
获取预先标记的固定干扰源的位置信息作为干扰源的位置信息;
和/或,将接收到所述目标声音信号的采集方向确定为目标声源方向后,将排除所述目标声源方向之外的其他采集方向对应的用户确定为干扰用户,获取所述干扰用户的位置信息作为干扰源的位置信息。
可选的,所述方法还包括:
根据目标用户的位置信息、空间的尺寸信息以及所述麦克风阵列的位置信息计算房间冲激响应,所述目标用户为所述目标声源方向对应的用户;
将所述房间冲激响应作为消除混响算法的初始参数,对所述采集的声音信号根据所述消除混响算法进行消除混响操作。
可选的,所述方法还包括:
根据干扰源的位置信息、空间的尺寸信息以及所述麦克风阵列的位置信息计算干扰混响信息;
所述对所述干扰源的方向进行定向抑制采集,包括:
根据所述干扰混响信息对所述干扰源的方向进行定向抑制采集。
可选的,所述方法还包括:
接收所述视觉传感系统发送的指定频率声音信号;
计算所述麦克风阵列的零度朝向与所述接收所述指定频率声音信号的方向之间的第一角度差。
可选的,所述根据所述用户的位置信息确定所述用户对应的采集方向,包括:
计算第一连线与第二连线之间的第二角度差;所述第一连线为根据所述视觉传感系统的位置信息与所述麦克风阵列的位置信息确定的所述视觉传感系统与所述麦克风阵列之间的连线,所述第二连线为根据所述麦克风阵列的位置信息与所述用户的位置信息确定的所述麦克风阵列与所述用户之间的连线;
根据所述第一角度差以及所述第二角度差确定所述麦克风阵列的零度朝向与所述第二连线之间的第三角度差,将所述第三角度差作为所述用户对应的采集方向。
可选的,所述方法还包括:
当获取到所述视觉传感系统检测到的无用户活动信号,控制进入待机状态。
在示例性实施例中,还提供了一种包括指令的非临时性计算机可读存储介质,例如包括指令的存储器604,上述指令可由装置600的处理器620执行以完成上述方法。例如,所述非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。
一种非临时性计算机可读存储介质,当所述存储介质中的指令由移动终端的处理器执行时,使得移动终端能够执行声音采集的方法,所述方法包括:
获取视觉传感系统实时采集的用户的位置信息;
根据所述用户的位置信息确定所述用户对应的采集方向;
对所述用户对应的采集方向进行定向收音;
当接收到目标声音信号时,将接收到所述目标声音信号的采集方向确定为目标声源方向;
对所述目标声源方向进行声音采集,获得采集的声音信号。
可选的,所述方法还包括:
获取干扰源的位置信息;
根据所述干扰源的位置信息确定所述干扰源的方向;
在对所述目标声源方向进行声音采集的过程中,对所述干扰源的方向进行定向抑制采集。
可选的,所述获取干扰源的位置信息,包括:
获取预先标记的固定干扰源的位置信息作为干扰源的位置信息;
和/或,将接收到所述目标声音信号的采集方向确定为目标声源方向后,将排除所述目标声源方向之外的其他采集方向对应的用户确定为干扰用户,获取所述干扰用户的位置信息作为干扰源的位置信息。
可选的,所述方法还包括:
根据目标用户的位置信息、空间的尺寸信息以及所述麦克风阵列的位置信息计算房间冲激响应,所述目标用户为所述目标声源方向对应的用户;
将所述房间冲激响应作为消除混响算法的初始参数,对所述采集的声音信号根据所述消除混响算法进行消除混响操作。
可选的,所述方法还包括:
根据干扰源的位置信息、空间的尺寸信息以及所述麦克风阵列的位置信息计算干扰混响信息;
所述对所述干扰源的方向进行定向抑制采集,包括:
根据所述干扰混响信息对所述干扰源的方向进行定向抑制采集。
可选的,所述方法还包括:
接收所述视觉传感系统发送的指定频率声音信号;
计算所述麦克风阵列的零度朝向与所述接收所述指定频率声音信号的方向之间的第一角度差。
可选的,所述根据所述用户的位置信息确定所述用户对应的采集方向,包括:
计算第一连线与第二连线之间的第二角度差;所述第一连线为根据所述视觉传感系统的位置信息与所述麦克风阵列的位置信息确定的所述视觉传感系统与所述麦克风阵列之间的连线,所述第二连线为根据所述麦克风阵列的位置信息与所述用户的位置信息确定的所述麦克风阵列与所述用户之间的连线;
根据所述第一角度差以及所述第二角度差确定所述麦克风阵列的零度朝向与所述第二连线之间的第三角度差,将所述第三角度差作为所述用户对应的采集方向。
可选的,所述方法还包括:
当获取到所述视觉传感系统检测到的无用户活动信号,控制进入待机状态。
本申请实施例中麦克风阵列首先从视觉传感系统获取实时采集的用户的位置信息,以根据用户的位置信息确定用户对应的采集方向。即,根据视觉传感系统采集的用户位置信息先确定可能的声源方向。再对用户对应的采集方向进行定向收音,如果在用户对应的采集方向接收到目标声音信号,则将接收到目标声音信号的采集方向确定为目标声源方向,进而对目标声源方向进行声音采集,从而获得所需的声音信号。即,本申请实施例通过视觉传感系统的辅助可以确定出多个可能的采集方向并确定出最终的目标声源方向,以根据已知的声源方向进行声音采集。避免了对空间全方位的扫描采集,提高了采集的准确性以及效率。另外,视觉传感系统可以实时采集用户的位置信息,以便麦克风阵列可以获取用户的实时位置信息,进而可以实时确定用户对应的采集方向,避免因用户移动导致定向收音不准确的问题。
图7是本发明实施例中服务器的结构示意图。该服务器700可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理器(central processing units,CPU)722(例如,一个或一个以上处理器)和存储器732,一个或一个以上存储应用程序742或数据744的存储介质730(例如一个或一个以上海量存储设备)。其中,存储器732和存储介质730可以是短暂存储或持久存储。存储在存储介质730的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对服务器中的一系列指令操作。更进一步地,中央处理器722可以设置为与存储介质730通信,在服务器700上执行存储介质730中的一系列指令操作。
终端700还可以包括一个或一个以上电源726,一个或一个以上有线或无线网络接口750,一个或一个以上输入输出接口758,一个或一个以上键盘756, 和/或,一个或一个以上操作系统741,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。
需要说明的是,本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于实施例公开的系统或装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。
应当理解,在本申请中,“至少一个(项)”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,用于描述关联对象的关联关系,表示可以存在三种关系,例如,“A和/或B”可以表示:只存在A,只存在B以及同时存在A和B三种情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b或c中的至少一项(个),可以表示:a,b,c,“a和b”,“a和c”,“b和c”,或“a和b和c”,其中a,b,c可以是单个,也可以是多个。
还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。
对所公开的实施例的上述说明,使本领域专业技术人员能够实现或使用本申请。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见 的,本文中所定义的一般原理可以在不脱离本申请的精神或范围的情况下,在其它实施例中实现。因此,本申请将不会被限制于本文所示的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。

Claims (18)

  1. 一种声音采集方法,其特征在于,所述方法应用于麦克风阵列,所述方法包括:
    获取视觉传感系统实时采集的用户的位置信息;
    根据所述用户的位置信息确定所述用户对应的采集方向;
    对所述用户对应的采集方向进行定向收音;
    当接收到目标声音信号时,将接收到所述目标声音信号的采集方向确定为目标声源方向;
    对所述目标声源方向进行声音采集,获得采集的声音信号。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    获取干扰源的位置信息;
    根据所述干扰源的位置信息确定所述干扰源的方向;
    在对所述目标声源方向进行声音采集的过程中,对所述干扰源的方向进行定向抑制采集。
  3. 根据权利要求2所述的方法,其特征在于,所述获取干扰源的位置信息,包括:
    获取预先标记的固定干扰源的位置信息作为干扰源的位置信息;
    和/或,将接收到所述目标声音信号的采集方向确定为目标声源方向后,将排除所述目标声源方向之外的其他采集方向对应的用户确定为干扰用户,获取所述干扰用户的位置信息作为干扰源的位置信息。
  4. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    根据目标用户的位置信息、空间的尺寸信息以及所述麦克风阵列的位置信息计算房间冲激响应,所述目标用户为所述目标声源方向对应的用户;
    将所述房间冲激响应作为消除混响算法的初始参数,对所述采集的声音信号根据所述消除混响算法进行消除混响操作。
  5. 根据权利要求2所述的方法,其特征在于,所述方法还包括:
    根据干扰源的位置信息、空间的尺寸信息以及所述麦克风阵列的位置信息计算干扰混响信息;
    所述对所述干扰源的方向进行定向抑制采集,包括:
    根据所述干扰混响信息对所述干扰源的方向进行定向抑制采集。
  6. 根据权利要求1-5任一项所述的方法,其特征在于,所述方法还包括:
    接收所述视觉传感系统发送的指定频率声音信号;
    计算所述麦克风阵列的零度朝向与所述接收所述指定频率声音信号的方向之间的第一角度差。
  7. 根据权利要求6所述的方法,其特征在于,所述根据所述用户的位置信息确定所述用户对应的采集方向,包括:
    计算第一连线与第二连线之间的第二角度差;所述第一连线为根据所述视觉传感系统的位置信息与所述麦克风阵列的位置信息确定的所述视觉传感系统与所述麦克风阵列之间的连线,所述第二连线为根据所述麦克风阵列的位置信息与所述用户的位置信息确定的所述麦克风阵列与所述用户之间的连线;
    根据所述第一角度差以及所述第二角度差确定所述麦克风阵列的零度朝向与所述第二连线之间的第三角度差,将所述第三角度差作为所述用户对应的采集方向。
  8. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    当获取到所述视觉传感系统检测到的无用户活动信号,控制进入待机状态。
  9. 一种声音采集装置,其特征在于,所述装置应用于麦克风阵列,所述装置包括:
    第一获取单元,用于获取视觉传感系统实时采集的用户的位置信息;
    第一确定单元,用于根据所述用户的位置信息确定所述用户对应的采集方向;
    收音单元,用于对所述用户对应的采集方向进行定向收音;
    第二确定单元,用于当接收到目标声音信号时,将接收到所述目标声音信号的采集方向确定为目标声源方向;
    第一采集单元,用于对所述目标声源方向进行声音采集,获得采集的声音信号。
  10. 根据权利要求9所述的装置,其特征在于,所述装置还包括:
    第二获取单元,用于获取干扰源的位置信息;
    第三确定单元,用于根据所述干扰源的位置信息确定所述干扰源的方向;
    第二采集单元,用于在对所述目标声源方向进行声音采集的过程中,对所述干扰源的方向进行定向抑制采集。
  11. 根据权利要求10所述的装置,其特征在于,所述第二获取单元,具体用于获取预先标记的固定干扰源的位置信息作为干扰源的位置信息;和/或,将接收到所述目标声音信号的采集方向确定为目标声源方向后,将排除所述目标声源方向之外的其他采集方向对应的用户确定为干扰用户,获取所述干扰用户的位置信息作为干扰源的位置信息。
  12. 根据权利要求9所述的装置,其特征在于,所述装置还包括:
    第一计算单元,用于根据目标用户的位置信息、空间的尺寸信息以及所述麦克风阵列的位置信息计算房间冲激响应,所述目标用户为所述目标声源方向对应的用户;
    消除单元,用于将所述房间冲激响应作为消除混响算法的初始参数,对所述采集的声音信号根据所述消除混响算法进行消除混响操作。
  13. 根据权利要求10所述的装置,其特征在于,所述装置还包括:
    第二计算单元,用于根据干扰源的位置信息、空间的尺寸信息以及所述麦克风阵列的位置信息计算干扰混响信息;
    所述第二采集单元,具体用于根据所述干扰混响信息对所述干扰源的方向进行定向抑制采集。
  14. 根据权利要求9-13任一项所述的装置,其特征在于,所述装置还包括:
    接收单元,用于接收所述视觉传感系统发送的指定频率声音信号;
    第三计算单元,用于计算所述麦克风阵列的零度朝向与所述接收所述指定频率声音信号的方向之间的第一角度差。
  15. 根据权利要求14所述的装置,其特征在于,所述第一确定单元,包括:
    计算子单元,用于计算第一连线与第二连线之间的第二角度差;所述第一 连线为根据所述视觉传感系统的位置信息与所述麦克风阵列的位置信息确定的所述视觉传感系统与所述麦克风阵列之间的连线,所述第二连线为根据所述麦克风阵列的位置信息与所述用户的位置信息确定的所述麦克风阵列与所述用户之间的连线;
    确定子单元,用于根据所述第一角度差以及所述第二角度差确定所述麦克风阵列的零度朝向与所述第二连线之间的第三角度差,将所述第三角度差作为所述用户对应的采集方向。
  16. 根据权利要求9所述的装置,其特征在于,所述装置还包括:
    控制单元,用于当获取到所述视觉传感系统检测到的无用户活动信号,控制进入待机状态。
  17. 一种用于声音采集的装置,其特征在于,包括有存储器,以及一个或者一个以上的程序,其中一个或者一个以上程序存储于存储器中,且经配置以由一个或者一个以上处理器执行所述一个或者一个以上程序包含用于进行以下操作的指令:
    获取视觉传感系统实时采集的用户的位置信息;
    根据所述用户的位置信息确定所述用户对应的采集方向;
    对所述用户对应的采集方向进行定向收音;
    当接收到目标声音信号时,将接收到所述目标声音信号的采集方向确定为目标声源方向;
    对所述目标声源方向进行声音采集,获得采集的声音信号。
  18. 一种计算机可读介质,其上存储有指令,当由一个或多个处理器执行时,使得装置执行如权利要求1至8中任一项所述的声音采集的方法。
PCT/CN2020/111684 2019-08-29 2020-08-27 一种声音采集方法及装置 WO2021037129A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910809070.4 2019-08-29
CN201910809070.4A CN110493690B (zh) 2019-08-29 2019-08-29 一种声音采集方法及装置

Publications (1)

Publication Number Publication Date
WO2021037129A1 true WO2021037129A1 (zh) 2021-03-04

Family

ID=68555164

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/111684 WO2021037129A1 (zh) 2019-08-29 2020-08-27 一种声音采集方法及装置

Country Status (2)

Country Link
CN (1) CN110493690B (zh)
WO (1) WO2021037129A1 (zh)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110493690B (zh) * 2019-08-29 2021-08-13 北京搜狗科技发展有限公司 一种声音采集方法及装置
CN111277931A (zh) * 2020-01-20 2020-06-12 东风汽车集团有限公司 可实现汽车隐私通话功能的装置
CN111343411B (zh) * 2020-03-20 2021-07-06 青岛海信智慧家居系统股份有限公司 一种智能远程视频会议系统
CN112185373A (zh) * 2020-09-07 2021-01-05 珠海格力电器股份有限公司 一种控制智能家居设备的方法、装置和音响
CN114374903B (zh) * 2020-10-16 2023-04-07 华为技术有限公司 拾音方法和拾音装置
CN112565973B (zh) * 2020-12-21 2023-08-01 Oppo广东移动通信有限公司 终端、终端控制方法、装置及存储介质
CN113766368B (zh) * 2021-08-20 2022-10-18 歌尔科技有限公司 音频设备的控制方法及音频设备
CN114268883A (zh) * 2021-11-29 2022-04-01 苏州君林智能科技有限公司 一种选择麦克风布放位置的方法与系统
CN114255557A (zh) * 2021-11-30 2022-03-29 歌尔科技有限公司 智能安防控制方法、智能安防设备及控制器
CN116417006A (zh) * 2021-12-31 2023-07-11 华为技术有限公司 声音信号处理方法、装置、设备及存储介质
CN115604643B (zh) * 2022-12-12 2023-03-17 杭州兆华电子股份有限公司 一种手机充电器生产不良自动检测定位方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012186551A (ja) * 2011-03-03 2012-09-27 Hitachi Ltd 制御装置、制御システムと制御方法
CN105679328A (zh) * 2016-01-28 2016-06-15 苏州科达科技股份有限公司 一种语音信号处理方法、装置及系统
CN108322855A (zh) * 2018-02-11 2018-07-24 北京百度网讯科技有限公司 用于获取音频信息的方法及装置
CN108957392A (zh) * 2018-04-16 2018-12-07 深圳市沃特沃德股份有限公司 声源方向估计方法和装置
CN109754814A (zh) * 2017-11-08 2019-05-14 阿里巴巴集团控股有限公司 一种声音处理方法、交互设备
CN110493690A (zh) * 2019-08-29 2019-11-22 北京搜狗科技发展有限公司 一种声音采集方法及装置

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2958339B1 (en) * 2013-02-15 2019-09-18 Panasonic Intellectual Property Management Co., Ltd. Directionality control system and directionality control method
WO2015170368A1 (ja) * 2014-05-09 2015-11-12 パナソニックIpマネジメント株式会社 指向性制御装置、指向性制御方法、記憶媒体及び指向性制御システム
JP6202277B2 (ja) * 2014-08-05 2017-09-27 パナソニックIpマネジメント株式会社 音声処理システム及び音声処理方法
KR102339798B1 (ko) * 2015-08-21 2021-12-15 삼성전자주식회사 전자 장치의 음향 처리 방법 및 그 전자 장치
JP2018107603A (ja) * 2016-12-26 2018-07-05 オリンパス株式会社 センサ情報取得装置、センサ情報取得方法、センサ情報取得プログラム及び医療器具
CN107680593A (zh) * 2017-10-13 2018-02-09 歌尔股份有限公司 一种智能设备的语音增强方法及装置
CN108694957B (zh) * 2018-04-08 2021-08-31 湖北工业大学 基于圆形麦克风阵列波束形成的回声抵消设计方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012186551A (ja) * 2011-03-03 2012-09-27 Hitachi Ltd 制御装置、制御システムと制御方法
CN105679328A (zh) * 2016-01-28 2016-06-15 苏州科达科技股份有限公司 一种语音信号处理方法、装置及系统
CN109754814A (zh) * 2017-11-08 2019-05-14 阿里巴巴集团控股有限公司 一种声音处理方法、交互设备
CN108322855A (zh) * 2018-02-11 2018-07-24 北京百度网讯科技有限公司 用于获取音频信息的方法及装置
CN108957392A (zh) * 2018-04-16 2018-12-07 深圳市沃特沃德股份有限公司 声源方向估计方法和装置
CN110493690A (zh) * 2019-08-29 2019-11-22 北京搜狗科技发展有限公司 一种声音采集方法及装置

Also Published As

Publication number Publication date
CN110493690A (zh) 2019-11-22
CN110493690B (zh) 2021-08-13

Similar Documents

Publication Publication Date Title
WO2021037129A1 (zh) 一种声音采集方法及装置
EP3576430B1 (en) Audio signal processing method and device, and storage medium
US9668048B2 (en) Contextual switching of microphones
US9838784B2 (en) Directional audio capture
US20150358768A1 (en) Intelligent device connection for wireless media in an ad hoc acoustic network
CN107749925B (zh) 音频播放方法及装置
US9820042B1 (en) Stereo separation and directional suppression with omni-directional microphones
WO2014161309A1 (zh) 一种移动终端实现声源定位的方法及装置
US20150358767A1 (en) Intelligent device connection for wireless media in an ad hoc acoustic network
CN111896961A (zh) 位置确定方法及装置、电子设备、计算机可读存储介质
CN111007462A (zh) 定位方法、定位装置、定位设备及电子设备
WO2022062531A1 (zh) 一种多通道音频信号获取方法、装置及系统
CN109543666A (zh) 结构光组件控制方法及装置
CN112770248B (zh) 音箱控制方法、装置及存储介质
US10306394B1 (en) Method of managing a plurality of devices
CN112672251A (zh) 一种扬声器的控制方法和系统、存储介质及扬声器
JP7394937B2 (ja) デバイス決定方法及び装置、電子機器、コンピュータ読み取り可能な記憶媒体
CN110660403B (zh) 一种音频数据处理方法、装置、设备及可读存储介质
CN110290576A (zh) 电子设备控制方法及装置
WO2022068608A1 (zh) 信号处理的方法和电子设备
CN115407272A (zh) 超声信号定位方法及装置、终端、计算机可读存储介质
CN113488066A (zh) 音频信号处理方法、音频信号处理装置及存储介质
WO2016109103A1 (en) Directional audio capture
CN113766402B (zh) 一种提高环境适应性的助听方法及装置
CN110047494B (zh) 设备响应方法、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20857399

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20857399

Country of ref document: EP

Kind code of ref document: A1