WO2021114953A1 - Voice signal acquisition method and apparatus, electronic device, and storage medium - Google Patents

Voice signal acquisition method and apparatus, electronic device, and storage medium Download PDF

Info

Publication number
WO2021114953A1
WO2021114953A1 PCT/CN2020/125590 CN2020125590W WO2021114953A1 WO 2021114953 A1 WO2021114953 A1 WO 2021114953A1 CN 2020125590 W CN2020125590 W CN 2020125590W WO 2021114953 A1 WO2021114953 A1 WO 2021114953A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice signal
signal
voice
directional microphone
electronic device
Prior art date
Application number
PCT/CN2020/125590
Other languages
French (fr)
Chinese (zh)
Inventor
尹明婕
缪海波
陶凯
张时
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021114953A1 publication Critical patent/WO2021114953A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L2025/783Detection of presence or absence of voice signals based on threshold decision

Definitions

  • This application belongs to the technical field of signal processing, and in particular relates to methods, devices, electronic equipment, and storage media for collecting voice signals.
  • a directional microphone in order to pick up a sound signal in a certain direction, a directional microphone can be used for voice collection, so as to improve the collection quality of the voice signal.
  • the directional microphone can have better voice signal collection ability in a specific direction, the collection effect of the high-frequency range of the directional microphone is weak, and there are sound stains and frequency response curves. Insufficient smoothness and other defects.
  • a high-quality voice signal still cannot be obtained, which affects the signal output effect of the microphone.
  • the embodiments of the present application provide a voice signal collection method, device, wearable device, and storage medium, which can improve the quality of voice collection and enhance the signal output effect of the microphone.
  • an embodiment of the present application provides a voice signal collection method, including:
  • the calibration voice signal is output.
  • the sound pickup direction of the directional microphone is determined according to the mutual positional relationship between the smart earphone and the user's mouth in the use state.
  • the acquiring the first voice signal through the directional microphone of the electronic device includes:
  • the signal compensation curve is generated according to the first voice signal and the second voice signal.
  • the first distance between the directional microphone and the human sound source is smaller than the omnidirectional microphone and the human sound source.
  • the second distance between sound sources is smaller than the omnidirectional microphone and the human sound source.
  • the generating the signal compensation curve according to the first voice signal and the second voice signal includes:
  • the signal compensation curve is generated by the filtered signal.
  • the performing signal compensation on the first voice signal according to a signal compensation curve to generate a calibration voice signal includes:
  • the calibration voice signal is generated based on the frequency response reference curve.
  • the gain ratio between each frequency value is determined according to the signal compensation curve, and the original frequency response curve is fitted based on the above gain ratio to obtain the frequency response calibration curve.
  • the method before the acquiring the first voice signal through the directional microphone of the electronic device, the method further includes:
  • the environmental sound intensity is less than the preset sound intensity threshold, acquiring a third voice signal through an omnidirectional microphone, and outputting the third voice signal;
  • the ambient sound is stronger than or equal to the preset sound intensity threshold, performing the acquisition of the first voice signal through the directional microphone of the electronic device.
  • the acquiring the first voice signal through the directional microphone of the electronic device includes:
  • the transmission component is controlled to adjust the directional microphone to the target angle, and the first voice signal is acquired at the target angle.
  • an embodiment of the present application provides a voice signal collection device, including:
  • the first voice signal acquiring unit is configured to acquire the first voice signal through the directional microphone of the electronic device;
  • a voice signal compensation unit configured to perform signal compensation on the first voice signal according to a signal compensation curve to generate a calibrated voice signal
  • the calibration voice signal output unit is used to output the calibration voice signal.
  • an embodiment of the present application provides an electronic device, a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor executes the The computer program implements the voice signal collection method described in any one of the above-mentioned first aspects.
  • an embodiment of the present application provides a computer-readable storage medium that stores a computer program, and is characterized in that, when the computer program is executed by a processor, any of the above-mentioned aspects of the first aspect is implemented.
  • a method for collecting the voice signal is implemented.
  • the embodiments of the present application provide a computer program product, which when the computer program product runs on a terminal device, causes the terminal device to execute the voice signal collection method described in any one of the above-mentioned first aspects.
  • the signal compensation is performed on the first voice signal based on the signal compensation curve to obtain the calibrated voice signal, so as to improve the collection effect of the high-frequency voice signal , Reduce the impact of sound staining, and output the adjusted calibrated voice signal, which can improve the quality of voice collection on the basis of ensuring that the environmental noise in the collected voice signal is reduced, thereby enhancing the signal output effect of the microphone.
  • FIG. 1 is a block diagram of a part of the structure of a smart headset provided by an embodiment of the present application
  • FIG. 2 is an implementation flowchart of a voice signal collection method provided by the first embodiment of the present application
  • Fig. 3 is a sound pickup area of a directional microphone provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of sound pickup in the use state of the smart headset provided by the first embodiment of the present application.
  • FIG. 5 is a specific implementation flowchart of a voice signal collection method S201 provided by the second embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of an existing smart headset provided by an embodiment of the present application.
  • FIG. 7 is a structural entity diagram of a smart headset provided by an embodiment of the present application.
  • FIG. 8 is a specific implementation flowchart of a voice signal collection method S2012 provided by the third embodiment of the present application.
  • FIG. 9 is a specific implementation flowchart of a voice signal collection method S202 provided by the fourth embodiment of the present application.
  • FIG. 10 is a specific implementation flowchart of a voice signal collection method provided by the fifth embodiment of the present application.
  • FIG. 11 is a specific implementation flowchart of a voice signal collection method S201 provided by the sixth embodiment of the present application.
  • FIG. 12 is a structural block diagram of a voice signal collection device provided by an embodiment of the present application.
  • FIG. 13 is a schematic diagram of an electronic device provided by another embodiment of the present application.
  • the term “if” can be construed as “when” or “once” or “in response to determination” or “in response to detecting “.
  • the phrase “if determined” or “if detected [described condition or event]” can be interpreted as meaning “once determined” or “in response to determination” or “once detected [described condition or event]” depending on the context ]” or “in response to detection of [condition or event described]”.
  • the voice signal collection method provided in the embodiments of the present application can be applied to electronic devices.
  • the electronic device can be any device with a voice collection function, including but not limited to mobile phones, tablets, wearable devices, in-vehicle devices, augmented reality (AR)/virtual reality (VR) with voice collection functions.
  • Devices notebook computers, ultra-mobile personal computers (UMPC), netbooks, personal digital assistants (personal digital assistants, PDAs), etc., can also be applied to databases, servers, and systems that can be controlled by voice.
  • the application embodiments do not impose any restrictions on the specific types of electronic devices.
  • the voice signal collection method provided in this application can be specifically stored in an electronic device in the form of an application program or software, and the electronic device implements the voice signal collection method provided in this application by executing the application program or software.
  • the electronic device may also be a wearable device.
  • the wearable device may be a general term for using wearable technology to intelligently design daily wear and develop wearable devices, such as wired earphones and wireless earphones. , Earphones, glasses, gloves, watches, clothing and shoes with microphone function.
  • a wearable device is a portable device that is directly worn on the body or integrated into the user's clothes or accessories, and is attached to the user's body to collect the user's voice signal. Wearable devices are not only a kind of hardware device, but also realize powerful functions through software support, data interaction, and cloud interaction.
  • wearable smart devices include full-featured, large-sized, complete or partial functions that can be realized without relying on smart phones, such as smart headphones or smart microphones, and only focus on a certain type of application function, and need to be used in conjunction with other devices such as smart phones. , Such as all kinds of smart bracelets and smart jewelry that can collect user voice signals.
  • FIG. 1 shows a block diagram of a part of the structure of a smart headset provided by an embodiment of the present application.
  • the smart headset includes: a near field communication module 110, a memory 120, an input unit 130, a sensor 140, an audio circuit 150, a processor 160, a power supply 170 and other components.
  • the structure of the smart headset shown in FIG. 1 does not constitute a limitation on the smart headset, and may include more or less components than those shown in the figure, or combine some components, or arrange different components.
  • the smart headset can receive the communication connection with the associated user terminal through the near field communication module 110, for example, establish communication with the user’s mobile phone, receive the audio signal sent by the user’s mobile phone through the near field communication module 110, and pass the speaker in the audio circuit 150 151 plays the audio signal, and can send the collected voice signal to the associated user terminal through the near field communication module 110.
  • the voice signal can be collected through a smart headset, At this time, the smart headset can establish a real-time communication link with the user terminal through the near field communication module 110, collect the user’s voice signal through the microphone 152 in the audio circuit 150, and feed the voice signal back to the user terminal through the real-time communication link. And send to the correspondent end.
  • the near field communication module 110 may be a Bluetooth communication module, a Bluetooth low energy communication module, or a WiFi communication module.
  • the memory 120 may be used to store software programs and modules.
  • the processor 160 executes various functional applications and data processing of the smart headset by running the software programs and modules stored in the memory 120, for example, the received voice signals and pre-configured signals
  • the compensation curve and the like are stored in the memory 120.
  • the memory 120 may mainly include a storage program area and a storage data area.
  • the storage program area may store an operating system, an application program required by at least one function (such as a sound playback function, a sound collection function, etc.), etc.; the storage data area may store data according to Data (such as audio data, etc.) created by the use of smart headphones, etc.
  • the memory 120 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices.
  • the input unit 130 may be used to receive input digital or character information, and generate key signal input related to user settings and function control of the smart headset 100.
  • the input unit 130 may include a control button 131 and other input devices 132.
  • the control button 131 may be a so-called contact button.
  • the key signal input corresponding to the position can be generated, without the user pressing, and the outer surface of the smart headset can be integrated.
  • the purpose of this touch button can collect the user's touch operation on or near it (for example, the user uses a finger or accessory to operate on or near the control button 131), and drive the corresponding according to the preset program Connect the device.
  • the user may generate a collection instruction for controlling the voice signal of the terminal through the input unit 130, and send the voice signal to the user terminal through the near field communication module 110.
  • the smart headset 100 may also include at least one sensor 140, such as a light sensor, a motion sensor, and other sensors.
  • the light sensor may include an ambient light sensor and a proximity sensor.
  • the ambient light sensor can determine whether there is a touch operation according to the brightness of the ambient light.
  • the proximity sensor can be greater than the distance between the two ear parts of the smart headset.
  • the accelerometer sensor can detect the magnitude of acceleration in various directions (usually three-axis), and can detect the magnitude and direction of gravity when it is stationary. It can be used to recognize the application of smart headset gestures and vibration recognition related functions ( Tap), etc.; as for the infrared sensor and other sensors that can be configured in the smart headset, I will not repeat them here.
  • the audio circuit 150, the speaker 151, and the microphone 152 can provide an audio interface between the user and the smart headset.
  • the audio circuit 150 can transmit the electric signal converted from the received audio data to the speaker 151, and the speaker 151 converts it into a sound signal for output; on the other hand, the microphone 152 converts the collected sound signal into an electric signal, which is then output by the audio circuit 150.
  • the processor 160 is the control center of the smart headset. It uses various interfaces and lines to connect the various parts of the entire smart headset, by running or executing software programs and/or modules stored in the memory 120, and calling data stored in the memory 120 , To perform various functions of the smart headset and process data, so as to monitor the smart headset as a whole.
  • the processor 160 may include one or more processing units; preferably, the processor 160 may integrate an application processor and a modem processor, where the application processor mainly processes the operating system, user interface, application programs, etc. , The modem processor mainly deals with wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 160.
  • the smart headset 100 also includes a power source 170 (such as a battery) for supplying power to various components.
  • a power source 170 such as a battery
  • the power source can be logically connected to the processor 160 through a power management system, so that functions such as charging, discharging, and power consumption management can be managed through the power management system. .
  • the voice signal collection device may be a smart microphone.
  • the smart microphone can process the acquired voice signal through the voice signal collection method provided in the embodiments provided in this application, and perform subsequent operations on the processed voice signal, such as playing it through a speaker, or retrieving the collected voice signal.
  • the voice information is stored in the memory for preservation.
  • the voice signal collection device may be an immersive VR headset.
  • the VR headgear can collect the user's voice signal through the built-in directional microphone, play the voice signal through the built-in speaker module, analyze the collected voice signal, convert it into character data, and adjust the VR headgear according to the character data The output content of the internal display unit.
  • the execution subject of the process is an electronic device.
  • the electronic device may be a smart headset, in particular, the smart headset is a true wireless (True Wireless Stereo, TWS) headset.
  • Fig. 2 shows an implementation flowchart of the voice signal collection method provided by the first embodiment of the present application, and the details are as follows:
  • the first voice signal is acquired through the directional microphone of the electronic device.
  • the smart headset may be equipped with a directional microphone, which can effectively acquire sound signals in a preset direction, that is, the directional microphone can extend to the preset sound pickup direction.
  • the sound range is a cardioid region that extends based on the pickup direction.
  • the target signal to be collected is in the direction of the cardioid extension area of the pickup range, the signal quality of the collected sound signal is the best, while in the opposite direction of the pickup direction, the signal quality of the collected sound signal is poor.
  • Fig. 3 shows the sound pickup area of a directional microphone provided by the present application. As shown in Figure 3, the pickup direction of the directional microphone is the direction corresponding to 0°.
  • the voice signal in the pickup direction has better collection instructions;
  • the opposite direction of the pickup direction of the directional microphone is the direction corresponding to 180°, that is, the collection effect of the voice signal collected at the directionality of 180° Poor.
  • each ring represents the collection sensitivity
  • the collection sensitivity of the center origin is -30 decibels (db)
  • the amplitude of the sound signal only needs to be greater than 30db, that is The directional microphone can be activated and the corresponding sound signal can be recorded.
  • the opposite direction of the sound pickup direction of the directional microphone that is, 180°, at a larger sound amplitude value, for example, -2db, it is still impossible to record the voice signal.
  • Fig. 4 shows a schematic diagram of sound pickup in a use state of a smart headset provided by an embodiment of the present application. As shown in Figure 4, when the smart headset is in use, its plane is parallel to the plane of the user's mouth, and the user's sound direction is perpendicular to the plane of the user's mouth.
  • the sound pickup of the directional microphone of the smart headset is designed In the direction, the sound pickup direction can also be kept perpendicular to the corresponding plane in the use state, thereby ensuring that the sound pickup direction is consistent with the sound direction of the user's mouth, and the user's vocal signal can be collected more clearly.
  • the smart headset may be configured with a signal collection trigger threshold and a collection end judgment threshold.
  • the built-in directional microphone of the smart headset will detect the sound amplitude of the sound signal in the current scene. If it detects that the sound amplitude is greater than the merchant’s collection trigger threshold, it will start to record the sound signal of the current environment to generate the first voice signal; During the recording process, if it is detected that the sound amplitude of the first voice signal within a certain period of time is less than the above-mentioned collection end judgment threshold, the collection of the sound signal of the current environment can be stopped. It should be noted that the value of the signal acquisition trigger threshold is greater than the aforementioned acquisition end judgment threshold.
  • the value of the signal acquisition trigger threshold can be appropriately increased to reduce the occurrence of false acquisitions; and during the recording process, due to the weak tone and pause during the person’s speech, Sometimes the sound amplitude is small, but it does not mean that the user's speech has been interrupted. Therefore, the threshold for judging the end of the collection can be appropriately lowered to ensure that the continuous speech is recorded in the same voice signal, thereby improving the continuity of the voice signal content.
  • the smart headset may be configured with a collection trigger condition, and a list of applications currently used by the user terminal can be obtained in a use state in which the smart headset is connected to the user terminal. If any application list contains preset sound collection applications, such as video call applications, voice call applications, and other recording-related applications, it can recognize that the preset collection trigger conditions are currently met, and send a voice collection to the smart headset Instructions to obtain the first voice signal through the smart headset.
  • a collection trigger condition such as video call applications, voice call applications, and other recording-related applications
  • the first voice signal in the current scene can be collected through the directional microphone.
  • the voice signal may be noise-filtered through a preset filter.
  • the filter is a high-frequency filter to filter high-frequency noise signals. Due to the low-frequency sensitivity of the directional microphone, low-frequency noise is difficult to receive, which is equivalent to the low-cut of the directional microphone when collecting sound signals, which naturally reduces the ability of collecting frequency noise. Based on this, in order to further suppress high-frequency noise, a high-frequency filter can be connected in series after signal acquisition to filter high-frequency noise signals.
  • signal compensation is performed on the first voice signal according to the signal compensation curve to generate a calibration voice signal.
  • the smart headset may perform signal compensation on the first voice signal through a preset signal compensation curve. Because the directional microphone is not friendly enough in the non-axial incident direction and the frequency response curve is not smooth enough, it will cause the first voice signal to have a low acquisition quality in the high frequency band, which is reflected in the waveform of the voice signal. The energy will be lower at mid and high frequencies. Based on this, the smart headset can compensate the collection defect of the directional microphone through the signal compensation curve after the first voice signal is collected, so as to obtain a compensated calibrated voice signal, thereby optimizing the signal collection effect of the directional microphone.
  • the manner of configuring the signal compensation curve may be: when the smart headset is shipped from the factory, the training waveform of multiple training voices at multiple different angles can be obtained through a directional microphone. Based on multiple training waveforms, the compensation coefficient corresponding to the high frequency band is determined, so that the signal compensation curve can be generated. In particular, if the corresponding collection effects of directional microphones at different sound collection angles are quite different, in this case, you can configure corresponding signal compensation for different sound collection angles according to the corresponding training waveforms at each angle. curve.
  • the current posture of the smart headset can be determined by reading the parameters fed back by the motion sensor, and based on the deviation angle of the posture from the normal use state, the user can be estimated
  • the included angle between the sound and the sound pickup direction, and the corresponding signal compensation curve is determined based on the included angle, and signal compensation is performed on the first voice signal through the signal compensation curve.
  • the smart headset can establish a communication connection with the user terminal.
  • the reference voice signal can be synchronously acquired through the microphone module built in the user terminal, and the reference voice signal can be acquired through the reference voice signal.
  • the amplitude ratio between the low frequency band, the middle frequency band, and the high frequency band, and a signal compensation curve is generated based on the amplitude ratio.
  • the smart headset when the smart headset receives the reference voice signal fed back by the user terminal, it can detect the average amplitude of the reference voice signal, and if the average amplitude of the reference voice signal is less than the preset reference threshold, then identify the reference The voice signal is invalid voice, and the first voice signal is signal compensated through the built-in signal compensation curve. Since the user terminal can be placed in a pocket or schoolbag when using a smart headset, the signal collection effect is poor at this time. Even if the reference voice signal is collected through the user terminal, it cannot have a good performance on the first voice signal in the high frequency band.
  • the calibration function in this case, the signal compensation for the first voice signal can be performed through the preset signal compensation curve.
  • the calibration voice signal is output.
  • the smart headset can output the calibration voice signal after signal compensation is performed on the first voice signal and the calibration voice signal is obtained.
  • the specific output mode of the calibration voice signal may be determined according to the trigger mode of the collection operation. For example, if the user terminal is performing a video call operation and collects the user's voice signal during the call through the smart headset, after the smart headset obtains the calibration voice signal, it can send the calibration voice to the user terminal through the near field communication module Signal to send the calibrated voice signal to the call peer through the user terminal to provide the voice signal during the video call; if the user uses the voice signal to perform voice control on the smart headset or the user terminal, the smart headset is acquiring the voice signal Later, the calibration voice signal can be analyzed, the calibration voice signal can be converted into character data, the control instructions associated with the calibration voice signal can be determined based on the keywords contained in the character data, and the smart headset or user terminal can be controlled based on the control instructions.
  • the smart headset can be used to respond to the recording request.
  • the voice signal can be stored in a local memory, or through a communication module, such as a near field communication module or radio frequency.
  • the communication module is sent to the user terminal or uploaded to the cloud server, and the calibration voice signal is saved through other terminals.
  • the smart headset may be connected with a speaker device, the smart headset sends the calibration voice signal to the speaker device through a wired link or a wireless communication link, and the calibration voice signal is output through the speaker device, thereby enabling Improve the playback effect of the voice signal, reduce the impact of noise, and make the voice signal closer to the user's original voice.
  • the voice signal collection method collects the original voice signal through the directional microphone, that is, the first voice signal, and performs signal compensation on the first voice signal based on the signal compensation curve to obtain Calibrate the voice signal to improve the collection effect of high-frequency voice signals, reduce the impact of sound staining, and output the adjusted calibrated voice signal, which can improve the quality of voice collection while ensuring that the collected voice signal reduces environmental noise. Thereby enhancing the signal output effect of the microphone.
  • FIG. 5 shows a specific implementation flowchart of a voice signal collection method S201 provided by the second embodiment of the present application.
  • S201 in a voice signal collection method provided in this embodiment includes: S2011 to S2012, which are detailed as follows:
  • the smart headset is configured with two microphone modules, which are an omnidirectional microphone and a directional microphone, respectively, and the above two microphone modules together form the microphone array of the smart headset.
  • the smart headset collects voice signals, it can turn on the omnidirectional microphone and the directional microphone at the same time. While collecting the first voice signal through the directional microphone, it also collects the second voice signal through the omnidirectional microphone, that is, the first voice signal. The initial recording time of the second voice signal is the same as that of the second voice signal.
  • the two microphones collect the voice signal in the current environment at the same time, which is only collected through different microphone modules.
  • the second voice signal is mainly used as a reference signal to adjust the first voice signal.
  • both the left earphone part and the right earphone part of the smart earphone can be configured with a directional microphone and an omnidirectional microphone.
  • the second voice signal picked up by the omnidirectional microphone contains much greater environmental noise than the first voice signal picked up by the directional microphone, the signal-to-noise ratio of the second voice signal is much lower than that of the first voice signal Therefore, when the voice signal is output, the first voice signal can be used as a reference for output, thereby improving the signal-to-noise ratio of the output voice signal.
  • the directional microphone is not friendly enough when the sound signal is collected in the non-pickup direction and its frequency response curve is not smooth enough, which will cause the loss of the medium and high frequency signal, so that the medium and high frequency energy of the first voice signal will be more than the full
  • the medium and high frequency energy of the second voice signal output by the directional microphone has poor fidelity. Therefore, the omnidirectional microphone can be called to collect the second voice signal at the same time to make up for the collection defect of the directional microphone in the middle and high frequency bands, thereby improving the collection efficiency.
  • the two earphone pendants jointly output audio to provide stereo. Therefore, during use, the user will wear the left earphone pendant and the right earphone pendant at the same time.
  • the directional microphone can be installed on the earphone pendant on one side, and the omnidirectional microphone can be installed on the earphone pendant on the other side, so that two microphones can be installed at the same time while reducing the size of the smart headset.
  • TWS headsets its portability directly affects the user experience. Therefore, by deploying directional microphones and omnidirectional microphones on both sides of the headset, the space utilization of the headset can be improved.
  • the omnidirectional microphone when the directional microphone is installed on the left earphone hanger, the omnidirectional microphone can be installed on the right earphone hanger; conversely, when the directional microphone is installed on the right earphone hanger, the omnidirectional microphone can be installed on the right earphone hanger.
  • the omnidirectional microphone when the directional microphone is installed on the left earphone hanger, the omnidirectional microphone can be installed on the right earphone hanger.
  • the omnidirectional microphone when the directional microphone is installed on the left earphone hanger, the omnidirectional microphone can be installed on the right earphone hanger; conversely, when the directional microphone is installed on the right earphone hanger, the omnidirectional microphone can be installed on the right earphone hanger.
  • Pendant when the directional microphone is installed on the left earphone hanger, the omnidirectional microphone can be installed on the right earphone hanger; conversely, when the directional microphone is installed on the right earphone hanger, the omnidirectional microphone can
  • the first distance between the directional microphone and the human voice source is smaller than the omnidirectional microphone and the human voice The second distance between sound sources. Because the first voice signal collected by the directional microphone is used as the reference signal, and the second voice signal collected by the omnidirectional microphone is used as the reference signal. Therefore, in order to improve the human voice collection quality of the reference signal, the directional microphone can be set close to the user's mouth, while the omnidirectional microphone can be set relatively far away from the user's mouth.
  • FIG. 6 shows a schematic structural diagram of an existing smart headset provided by this embodiment.
  • the smart headset is specifically a true wireless headset.
  • the true wireless headset is equipped with two microphone modules, and the two microphone modules are both omnidirectional microphone modules, which are the omnidirectional microphone 1 and the omnidirectional microphone respectively.
  • Directional microphone 2 Compared with wired earphones, TWS earphones can effectively break free from wire constraints from the perspective of user use, and can improve the user experience when wearing them.
  • headsets In the existing smart device technology, in addition to providing audio output services for users, headsets often need to undertake audio collection services at the same time.
  • TWS headsets are in use, they need to be hung on the user’s ears to pick up the sound, and there are no other As a result, compared with wired earphones, the user cannot directly aim the microphone on the earphone to the mouth to pick up the sound, and the TWS earphone is worn far away from the user’s mouth, which makes the pickup on the TWS earphone. The sound quality is far inferior to wired earphones that pick up sound at close range. And as shown in the figure, the existing TWS headsets are often equipped with two omnidirectional microphones, and because the omnidirectional microphones can sense changes in sound pressure in all directions, they can receive sounds from all directions with equal sensitivity.
  • the sound pickup characteristic is a spherical shape, and the two-dimensional plan view of the pickup range of the directional microphone is 360°circle as shown in the figure. Therefore, the omnidirectional microphone is used in addition to acquiring the human voice signal sent by the user in a specific direction. It will also receive interference signals in all other directions, and both the interference signal and the human voice signal are picked up with equal sensitivity, which reduces the low signal-to-noise ratio of the human voice signal and poor sound pickup quality.
  • the high-frequency component of the voice signal (greater than its frequency band above 4k) has strong directivity, that is, it is consistent with the direction of the user’s mouth.
  • the omnidirectional microphone needs to be hung on the ear when it is in use.
  • FIG. 7 shows a structural entity diagram of a smart headset provided by this embodiment.
  • the smart headset is specifically a true wireless headset, and two microphone modules are configured on the true wireless headset, namely, an omnidirectional microphone and a directional microphone.
  • the microphone at the bottom of the headset is replaced by an omnidirectional microphone with a directional microphone.
  • the pickup range of a directional microphone is a cardioid area; and the pickup range of an omnidirectional microphone is a spherical area, as shown in Figure 7.
  • the distance between the lower end and the user’s mouth is closer when the smart headset is in use.
  • the adaptive direction of the directional microphone can be determined according to the relative position direction between the installation position of the directional microphone and the mouth when the earphone is in use.
  • the installation position of the directional microphone may be the starting point and the user's mouth as the end point, a direction vector is generated, and the pickup direction of the directional microphone is determined according to the preset compensation angle and the direction vector.
  • the microphone opening of the directional microphone faces downwards, which can effectively achieve physical wind protection and reduce the impact of ambient airflow on the sound reception. Since the smart headset is in an upright state in use, the airflow will impact the surroundings of the smart headset in this state. If the microphone port is opened on either side of the cylinder, it will be affected by the airflow and pick up the vibration during the airflow. The airflow noise reduces the signal-to-noise ratio of the human voice signal. Therefore, the opening of the microphone mouth can be directed downward, thereby improving the signal-to-noise ratio of the human voice signal. It should be noted that the opening direction of the microphone port is different from the pickup direction of the directional microphone.
  • the microphone port that is, the sound guide hole of the microphone, refers to the physical pickup position, while the pickup direction of the directional microphone is specifically the pickup direction. In terms of effect, the direction in which the protrusion of the cardioid pickup direction is located.
  • the signal compensation curve is generated according to the first voice signal and the second voice signal.
  • the smart headset after the smart headset obtains the second voice signal collected by the omnidirectional microphone, it can use the second voice signal as a reference signal and generate a signal compensation curve based on the first voice signal and the second voice signal. This ensures that the signal compensation curve matches the first voice signal currently collected, realizes the real-time generation and dynamic adjustment of the signal compensation curve, and improves the accuracy of the signal compensation curve.
  • the manner of generating the signal compensation curve may be: the smart headset may fit the corresponding real-time frequency response function according to the frequency response amplitude relationship of each frequency band in the second voice signal. And according to the deviation value between each frequency band in the first voice signal and the above-mentioned real-time frequency response function, the adjustment ratio corresponding to each frequency band is obtained, and the above-mentioned signal compensation curve is generated according to the adjustment ratio of all frequency bands.
  • the signal compensation curve can be generated as follows: the smart headset has a built-in reference compensation curve and a neural learning model. After the first voice signal and the second voice signal are collected each time, the dimensionality reduction pool is used. The first pooling matrix corresponding to the first voice signal and the second pooling matrix of the second voice signal are generated in the way of transformation convolution, and the frequency response characteristic values corresponding to the two pooling matrices are extracted, according to the two frequency response The difference between the eigenvalues generates adjustment parameters, imports the adjustment parameters into the above-mentioned neural learning model, performs parameter learning adjustments on the reference compensation curve, and uses the adjusted reference compensation curve as the above-mentioned signal compensation curve, thereby realizing dynamic learning The purpose of adjustment is to improve the accuracy of the signal compensation curve.
  • the second voice signal corresponding to the first voice signal is simultaneously collected by the omnidirectional microphone, and the second voice signal is used as the reference signal to generate the signal compensation curve, so that the signal compensation curve can be generated in real time, and the signal is improved.
  • the accuracy of the compensation curve is optimized to optimize the pickup effect of the directional microphone.
  • FIG. 8 shows a specific implementation flowchart of a voice signal collection method S2012 provided by the third embodiment of the present application.
  • S2012 in a voice signal collection method provided in this embodiment includes: S801 to S804, which are detailed as follows:
  • the generating the signal compensation curve according to the first voice signal and the second voice signal includes:
  • the first voice signal is converted into a first frequency domain signal
  • the second voice signal is converted into a second frequency domain signal
  • the first speech signal and the second speech signal are specifically time-domain signals. Therefore, the first speech signal and the second speech signal can be converted from the time-domain signal to the frequency-domain signal through the Fourier transform algorithm. Thus, the first frequency domain signal corresponding to the first voice signal and the second frequency response curve corresponding to the second voice signal can be obtained.
  • the aforementioned frequency response curve is specifically the corresponding relationship between frequency and gain.
  • the human voice speech frequency band is extracted according to the first frequency domain signal.
  • the directional microphone has a better collection effect when collecting human voice signals, and can suppress environmental noise in multiple directions, the signal-to-noise ratio of the human voice signal is high, so by comparing the first voice signal The corresponding first frequency domain signal is used to identify the human voice spectrum.
  • the smart headset can determine the signal segment of the human voice range in the first frequency domain signal, and recognize the human voice frequency according to the gain corresponding to each frequency value in the signal segment, and compose all the recognized human voice frequencies into the human voice speech Frequency band.
  • the smart headset may be set with an effective gain threshold.
  • the smart headset will determine whether the gain of each frequency value in the human voice frequency band is greater than the effective gain threshold in the above intercepted signal segment. If it is, the frequency value is recognized as the human voice frequency; on the contrary, if any frequency value in the signal segment is If the gain is less than or equal to the effective gain threshold, it is recognized that there is no effective information at the frequency value in the human voice signal collected this time, and it is recognized as a noise frequency.
  • the smart headset uses the first frequency domain signal as a reference to determine the human voice voice frequency band contained in the voice signal collected this time.
  • the directional microphone performs gain compensation.
  • the omnidirectional microphone has the same collection sensitivity in all directions, so noise will be acquired in each frequency band, so it is difficult to identify the human voice frequency band through the second voice signal, so the human voice voice can be extracted from the first voice signal Frequency band, so as to make up for the defect of low accuracy of human voice recognition in the second voice signal.
  • the smart headset after the smart headset determines the human voice frequency band through the first frequency response signal, it can mark each frequency value in the human voice speech frequency band in the second frequency response signal, thereby identifying that the second frequency response signal contains
  • the vocal frequency of the human voice is recognized as the noise frequency except for the human voice frequency, and the noise frequency is filtered out, thereby generating a filtered signal after the noise signal is eliminated.
  • the filtered signal only contains the human voice frequency, and the gain value of the human voice frequency in each frequency band is recorded.
  • the signal compensation curve is generated through the filtered signal.
  • the filter signal generated by the second frequency response signal can be used to determine the relative proportions between different frequency values, and the relative proportions can be determined based on the relative proportions.
  • the gain compensation coefficient in the middle and high frequency bands can generate a signal compensation curve to eliminate the problem of innate acquisition distortion of the directional microphone.
  • signal compensation is performed on the first voice signal through the signal compensation curve, which can increase the high frequency gain, so as to compensate for the high frequency caused by the use position of the directional microphone away from the axis of the user’s speaking direction.
  • the sound energy of the signal is attenuated. For example, if the high-frequency radiation direction of the voice signal is in the direction indicated by the front of the user’s mouth, there will be a loss of high-frequency signals during the pickup process, so the frequency response of the signal compensation curve increases in the frequency band of 2-7kHz.
  • the gain compensation of 8 ⁇ 10dB can increase the peak point at 5kHz.
  • the gain compensation through the signal compensation curve can also reduce the high frequency loss caused by the masking of the face to a certain extent, and further improve the signal acquisition quality.
  • the signal compensation curve is generated according to the previously collected defects of the directional microphone and the omnidirectional microphone, which can improve the signal compensation curve. Accuracy,.
  • FIG. 9 shows a specific implementation flowchart of a voice signal collection method S202 provided by the fourth embodiment of the present application.
  • S202 in a voice signal collection method provided in this embodiment includes: S2021 to S2023, and the details are as follows:
  • the performing signal compensation on the first voice signal according to a signal compensation curve to generate a calibration voice signal includes:
  • the smart headset can convert the first voice signal into a frequency domain signal through Fourier transform, so that the gain value corresponding to each frequency value in the first voice signal can be determined, and the original frequency response curve can be obtained.
  • the original frequency response curve is used to express the correlation between the frequency value and the gain.
  • each frequency value in the original frequency response curve is fitted to obtain a frequency response calibration curve.
  • the signal compensation curve can compensate the gain loss of the directional microphone in the high frequency band during the acquisition process. Therefore, after the original frequency response curve of the first voice signal is generated, the signal compensation curve can be The waveform trend of the gain value of the frequency band adjusts each frequency in the original frequency response curve to generate a frequency response calibration curve.
  • the smart headset may determine the gain ratio between the various frequency values according to the signal compensation curve, and fit the original frequency response curve based on the above gain ratio to obtain the frequency response calibration curve. For example, in the signal compensation curve, the gain ratio between the low frequency band and the mid frequency band is 80%, while in the original frequency response curve, the gain ratio between the low frequency band and the mid frequency band is 120%, that is, the gain ratio of the mid frequency band is missing. The gain of each frequency value in the mid-range can be increased, so that the ratio is adjusted to 80%, which is consistent with the gain ratio in the signal compensation curve.
  • the calibration voice signal is generated based on the frequency response reference curve.
  • an inverse Fourier transform is performed on the frequency response calibration curve, and the frequency domain signal is reconverted into a time domain signal to obtain the above-mentioned calibration voice signal.
  • the first voice signal is converted into the original frequency response curve, so that the original frequency response curve is curve-fitted based on the signal compensation curve, thereby compensating for the frequency response loss of the mid-to-high frequency band, thereby improving the directional microphone Signal acquisition instructions.
  • FIG. 10 shows a specific implementation flowchart of a voice signal collection method provided by the fifth embodiment of the present application.
  • the method for collecting a voice signal provided by this embodiment is described in that the first voice is acquired through the directional microphone of the electronic device. Before the signal, it also includes: S1001 ⁇ S1003, the details are as follows:
  • the method further includes:
  • the smart headset before the smart headset needs to perform the voice signal collection operation, it can detect the ambient sound intensity in the current scene and determine the current recording environment. If the current environmental sound intensity is relatively small, it means that the current environment is relatively quiet, and the environmental noise is low, so you can perform the operation of S1002; on the contrary, if the current environmental sound intensity is high, it may indicate that the current environment is relatively noisy , The environmental noise is large, and the environmental noise has a large impact on the recording. At this time, you can perform the operation of S1003.
  • the smart headset is equipped with a directional microphone and an omnidirectional microphone, and two different microphones can correspond to recording operations in different scenes.
  • a third voice signal is acquired through an omnidirectional microphone, and the third voice signal is output.
  • the third voice signal can be obtained through the omnidirectional microphone built into the smart headset. Since the omnidirectional microphone has the same sensitivity to the sound pickup in all directions, it will obtain the feedback noise signal in all directions. In the case of weak environmental sound, even if the noise signal in all directions is collected, the signal of the human voice signal The noise ratio is also less affected, so an omnidirectional microphone can be used to collect the human voice signal, that is, the above-mentioned third voice signal, and directly output the third voice signal.
  • the first voice signal is acquired through the directional microphone of the electronic device.
  • a directional microphone needs to be used to obtain human voice signals to reduce environmental noise.
  • the operation of acquiring the first voice signal through the directional microphone of the electronic device is performed, and the subsequent signal compensation means is used to signal the first voice signal Compensation to reduce the gain loss of the directional microphone in the middle and high frequency bands.
  • the smart headset is equipped with an omnidirectional microphone and a directional microphone, and the corresponding microphone is used to collect voice signals in different scenarios, which can improve signal collection instructions while reducing unnecessary compensation operations. , Thereby improving the collection efficiency of voice signals and reducing the amount of calculation of the smart headset.
  • FIG. 11 shows a specific implementation flowchart of a voice signal collection method S201 provided by the sixth embodiment of the present application.
  • a voice signal collection method S201 provided in this embodiment includes: S1101 to S1103, which are detailed as follows:
  • the acquiring the first voice signal through the directional microphone of the electronic device includes:
  • the directional microphone is controlled by the transmission component of the electronic device to collect training voice signals at multiple preset angles.
  • the installation position of the directional microphone of the smart headset is equipped with a transmission component, which can control the microphone to rotate in a preset plane with a preset step length, so as to realize collection at multiple preset angles. voice signal. Since the directional microphone has a certain pick-up direction, the pick-up direction of the directional microphone will also change as it rotates with the transmission components. In the process of collecting multi-terminal dialogues, you can adjust the directional microphone The direction of the sound pickup, and thus the collection effect of the voice signal. Based on this, before the smart headset collects voice signals, the directional microphone can be rotated to multiple preset angles through the transmission component, so as to adjust the pickup direction of the directional microphone, and obtain training voice signals collected at various preset angles . It should be noted that the smart headset can collect training voice signals of a preset duration at multiple preset angles. For example, the collection duration of the training voice signal may be 0.5s.
  • the target angle is determined according to the average amplitude of the human voice frequency band in each of the training speech signals.
  • the smart headset converts each training voice signal into a frequency domain signal, and extracts the human voice frequency band from each training frequency domain signal according to the human vocal frequency range, and calculates the average amplitude corresponding to each human voice voice frequency band value.
  • the preset angle with the largest average amplitude is selected as the target angle. In this angle, the average increase in the human voice frequency band is larger, that is, the human voice signal collection effect is better, and it has a higher signal-to-noise ratio. Therefore, the preset angle can be used as the target angle to record the user at the target angle Voice signal.
  • the transmission component is controlled to adjust the directional microphone to the target angle, and the first voice signal is acquired at the target angle.
  • the directional microphone can be adjusted to the target angle through the transmission component, and the first voice signal is collected at the target angle.
  • the position of the user occurs Change determine the target angle through the transmission component, can further improve the collection effect of the human voice signal.
  • the transmission component is adjusted to the pickup direction of the directional microphone, thereby determining the best pickup angle and improving the pickup effect.
  • FIG. 12 shows a structural block diagram of a voice signal collection device provided in an embodiment of the present application. For ease of description, only the information related to the embodiment of the present application is shown. section.
  • the voice signal collection device includes:
  • the first voice signal acquiring unit 121 is configured to acquire the first voice signal through the directional microphone of the electronic device;
  • the voice signal compensation unit 122 is configured to perform signal compensation on the first voice signal according to the signal compensation curve to generate a calibration voice signal
  • the calibration voice signal output unit 123 is used to output the calibration voice signal.
  • the first voice signal acquiring unit 121 includes:
  • a second voice signal acquiring unit configured to collect a second voice signal corresponding to the first voice signal through an omnidirectional microphone of the electronic device when acquiring the first voice signal;
  • the signal compensation curve generating unit is configured to generate the signal compensation curve according to the first voice signal and the second voice signal.
  • the first distance between the directional microphone and the human sound source is smaller than the second distance between the omnidirectional microphone and the human sound source .
  • the signal compensation curve generating unit includes:
  • a frequency domain signal conversion unit configured to convert the first voice signal into a first frequency domain signal, and convert the second voice signal into a second frequency domain signal
  • the human voice frequency band extraction unit is configured to extract the human voice frequency band according to the first frequency domain signal
  • a filtered signal generating unit configured to filter noise frequency bands in the second frequency domain signal except for the human voice speech frequency band to obtain a filtered signal
  • the filtered signal conversion unit is configured to generate the signal compensation curve through the filtered signal.
  • the voice signal compensation unit 122 further includes:
  • An original frequency response curve generating unit configured to generate the original frequency response curve of the first voice signal
  • a frequency response calibration curve output unit configured to fit each frequency value in the original frequency response curve with the signal compensation curve as a reference to obtain a frequency response calibration curve
  • the calibration voice signal generating unit is configured to generate the calibration voice signal based on the frequency response reference curve.
  • the voice signal collection device further includes:
  • the ambient sound intensity acquisition unit is used to acquire the current ambient sound intensity
  • the third voice signal acquisition unit is configured to obtain a third voice signal through an omnidirectional microphone and output the third voice signal if the environmental sound intensity is less than a preset sound intensity threshold;
  • the first voice signal collection trigger unit is configured to perform the acquisition of the first voice signal through the directional microphone of the electronic device if the environmental sound is stronger than or equal to the preset sound intensity threshold.
  • the first voice signal acquiring unit 121 includes:
  • a training voice signal collection unit configured to control the directional microphone to collect training voice signals at multiple preset angles through the transmission component of the electronic device
  • the target angle recognition unit is configured to determine the target angle according to the average amplitude of the human voice frequency band in each of the training speech signals;
  • the target angle adjustment unit is configured to control the transmission component to adjust the directional microphone to the target angle, and obtain the first voice signal at the target angle.
  • the voice signal collection device provided in the embodiment of the present application can also collect the original voice signal, that is, the first voice signal, through the directional microphone, and perform signal compensation on the first voice signal based on the signal compensation curve to obtain the calibrated voice signal.
  • the voice signal collection device can also collect the original voice signal, that is, the first voice signal, through the directional microphone, and perform signal compensation on the first voice signal based on the signal compensation curve to obtain the calibrated voice signal.
  • the voice signal collection device provided in the embodiment of the present application can also collect the original voice signal, that is, the first voice signal, through the directional microphone, and perform signal compensation on the first voice signal based on the signal compensation curve to obtain the calibrated voice signal.
  • FIG. 13 is a schematic structural diagram of an electronic device provided by an embodiment of the application.
  • the electronic device 13 of this embodiment includes: at least one processor 130 (only one is shown in FIG. 13), a processor, a memory 131, and a processor stored in the memory 131 and capable of being processed in the at least one processor
  • the computer program 132 running on the processor 130 when the processor 130 executes the computer program 132, implements the steps in any of the foregoing embodiments of the voice signal collection method.
  • the electronic device 13 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud electronic device.
  • the electronic device may include, but is not limited to, a processor 130 and a memory 131.
  • FIG. 13 is only an example of the electronic device 13 and does not constitute a limitation on the electronic device 13. It may include more or less components than those shown in the figure, or a combination of certain components, or different components. , For example, can also include input and output devices, network access devices, and so on.
  • the so-called processor 130 may be a central processing unit (Central Processing Unit, CPU), and the processor 130 may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSPs), and application specific integrated circuits (Application Specific Integrated Circuits). , ASIC), ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the memory 131 may be an internal storage unit of the electronic device 13 in some embodiments, such as a hard disk or a memory of the electronic device 13. In other embodiments, the memory 131 may also be an external storage device of the voice signal collection device 13, such as a plug-in hard disk or a smart memory card (Smart Media Card, SMC) equipped on the electronic device 13 , Secure Digital (SD) card, Flash Card, etc. Further, the memory 131 may also include both an internal storage unit of the electronic device 13 and an external storage device.
  • the memory 131 is used to store an operating system, an application program, a boot loader (BootLoader), data, and other programs, such as the program code of the computer program. The memory 131 can also be used to temporarily store data that has been output or will be output.
  • An embodiment of the present application also provides a network device, which includes: at least one processor, a memory, and a computer program stored in the memory and running on the at least one processor, and the processor executes The computer program implements the steps in any of the foregoing method embodiments.
  • the embodiments of the present application also provide a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the steps in each of the foregoing method embodiments can be realized.
  • the embodiments of the present application provide a computer program product.
  • the steps in the foregoing method embodiments can be realized when the mobile terminal is executed.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the computer program can be stored in a computer-readable storage medium.
  • the computer program can be stored in a computer-readable storage medium.
  • the steps of the foregoing method embodiments can be implemented.
  • the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file, or some intermediate forms.
  • the computer-readable medium may at least include: any entity or device capable of carrying the computer program code to the photographing device/terminal device, recording medium, computer memory, read-only memory (ROM, Read-Only Memory), and random access memory (RAM, Random Access Memory), electric carrier signal, telecommunications signal and software distribution medium.
  • ROM read-only memory
  • RAM random access memory
  • electric carrier signal telecommunications signal and software distribution medium.
  • U disk mobile hard disk, floppy disk or CD-ROM, etc.
  • computer-readable media cannot be electrical carrier signals and telecommunication signals.
  • the disclosed apparatus/network equipment and method may be implemented in other ways.
  • the device/network device embodiments described above are only illustrative.
  • the division of the modules or units is only a logical function division, and there may be other divisions in actual implementation, such as multiple units.
  • components can be combined or integrated into another system, or some features can be omitted or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

Abstract

A voice signal acquisition method and apparatus, an electronic device, and a storage medium. The method comprises: obtaining a first voice signal by means of a directional microphone of an electronic device (S201); performing signal compensation on the first voice signal according to a signal compensation curve to generate a calibration voice signal (S202); and outputting the calibration voice signal (S203). According to the solution, an original voice signal is obtained by acquisition by a directional microphone, i.e., a first voice signal, and then the first voice signal is performed signal compensation on the basis of a signal compensation curve to obtain a calibration voice signal to improve the acquisition effect of a voice signal at a high audio segment and reduce the influence of noise pollution, and the adjusted calibration voice signal is output, which can improve the quality of voice acquisition while ensuring that the ambient noise in the acquired voice signal is reduced, thereby improving the signal output effect of the microphone.

Description

语音信号的采集方法、装置、电子设备以及存储介质Voice signal collection method, device, electronic equipment and storage medium
本申请要求于2019年12月12日提交国家知识产权局、申请号为201911275035.5、申请名称为“语音信号的采集方法、装置、电子设备以及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application submitted to the State Intellectual Property Office on December 12, 2019, the application number is 201911275035.5, and the application name is "Voice signal collection methods, devices, electronic equipment, and storage media", and its entire contents Incorporated in this application by reference.
技术领域Technical field
本申请属于信号处理技术领域,尤其涉及语音信号的采集方法、装置、电子设备以及存储介质。This application belongs to the technical field of signal processing, and in particular relates to methods, devices, electronic equipment, and storage media for collecting voice signals.
背景技术Background technique
随着便携式终端消费类电子产品的发展,大量的微型电容式麦克风得到了广泛的应用。现有技术中,为了拾取某一方向的声音信号,可以采用具有指向性的麦克风进行语音采集,以提高语音信号的采集质量。然而现有的语音信号的采集技术,虽然指向性麦克风能够在特定方向具有较好的语音信号的采集能力,但由于指向性麦克风存在高音频段的采集效果较弱,并且存在声染以及频响曲线不够平滑等缺陷,在采用指向性麦克风时,依然无法得到质量较高的语音信号,从而影响了麦克风的信号输出效果。With the development of portable terminal consumer electronic products, a large number of miniature condenser microphones have been widely used. In the prior art, in order to pick up a sound signal in a certain direction, a directional microphone can be used for voice collection, so as to improve the collection quality of the voice signal. However, in the existing voice signal collection technology, although the directional microphone can have better voice signal collection ability in a specific direction, the collection effect of the high-frequency range of the directional microphone is weak, and there are sound stains and frequency response curves. Insufficient smoothness and other defects. When using a directional microphone, a high-quality voice signal still cannot be obtained, which affects the signal output effect of the microphone.
发明内容Summary of the invention
本申请实施例提供了一种语音信号的采集方法、装置、可穿戴设备以及存储介质,可以提高语音采集的质量,增强麦克风的信号输出效果。The embodiments of the present application provide a voice signal collection method, device, wearable device, and storage medium, which can improve the quality of voice collection and enhance the signal output effect of the microphone.
第一方面,本申请实施例提供了一种语音信号的采集方法,包括:In the first aspect, an embodiment of the present application provides a voice signal collection method, including:
通过电子设备的指向性麦克风获取第一语音信号;Acquiring the first voice signal through the directional microphone of the electronic device;
根据信号补偿曲线对所述第一语音信号进行信号补偿,生成校准语音信号;Performing signal compensation on the first voice signal according to the signal compensation curve to generate a calibration voice signal;
输出所述校准语音信号。The calibration voice signal is output.
示例性的,根据智能耳机在使用状态下与用户嘴部之间的相互位置关系,确定指向性麦克风的拾音方向。Exemplarily, the sound pickup direction of the directional microphone is determined according to the mutual positional relationship between the smart earphone and the user's mouth in the use state.
在第一方面的一种可能的实现方式中,所述通过电子设备的指向性麦克风获取第一语音信号,包括:In a possible implementation manner of the first aspect, the acquiring the first voice signal through the directional microphone of the electronic device includes:
在获取所述第一语音信号时,通过所述电子设备的全向性麦克风采集所述第一语音信号对应的第二语音信号;When acquiring the first voice signal, collecting a second voice signal corresponding to the first voice signal through an omnidirectional microphone of the electronic device;
根据所述第一语音信号以及所述第二语音信号生成所述信号补偿曲线。The signal compensation curve is generated according to the first voice signal and the second voice signal.
在第一方面的一种可能的实现方式中,在所述电子设备采集语音信号时,所述指向性麦克风与人声声源之间的第一距离小于所述全向性麦克风与所述人声声源之间的第二距离。In a possible implementation of the first aspect, when the electronic device collects voice signals, the first distance between the directional microphone and the human sound source is smaller than the omnidirectional microphone and the human sound source. The second distance between sound sources.
在第一方面的一种可能的实现方式中,所述根据所述第一语音信号以及所述第二语音信号生成所述信号补偿曲线,包括:In a possible implementation manner of the first aspect, the generating the signal compensation curve according to the first voice signal and the second voice signal includes:
将所述第一语音信号转换为第一频域信号,以及将所述第二语音信号转换为第二频域信号;Converting the first voice signal into a first frequency domain signal, and converting the second voice signal into a second frequency domain signal;
根据所述第一频域信号提取人声语音频段;Extracting a human voice speech frequency band according to the first frequency domain signal;
过滤所述第二频域信号中除所述人声语音频段外的噪声频段,得到滤波信号;Filtering noise frequency bands in the second frequency domain signal other than the human voice speech frequency band to obtain a filtered signal;
通过所述滤波信号生成所述信号补偿曲线。The signal compensation curve is generated by the filtered signal.
在第一方面的一种可能的实现方式中,所述根据信号补偿曲线对所述第一语音信号进行信号补偿,生成校准语音信号,包括:In a possible implementation manner of the first aspect, the performing signal compensation on the first voice signal according to a signal compensation curve to generate a calibration voice signal includes:
生成所述第一语音信号的原始频响曲线;Generating an original frequency response curve of the first voice signal;
以所述信号补偿曲线为基准,对所述原始频响曲线内各个频率值进行拟合,得到频响校准曲线;Using the signal compensation curve as a reference, fitting each frequency value in the original frequency response curve to obtain a frequency response calibration curve;
基于所述频响基准曲线生成所述校准语音信号。The calibration voice signal is generated based on the frequency response reference curve.
示例性地,根据信号补偿曲线,确定各个频率值之间的增益比例,基于上述增益比例对原始频响曲线进行拟合,以得到频响校准曲线。Exemplarily, the gain ratio between each frequency value is determined according to the signal compensation curve, and the original frequency response curve is fitted based on the above gain ratio to obtain the frequency response calibration curve.
在第一方面的一种可能的实现方式中,在所述通过电子设备的指向性麦克风获取第一语音信号之前,还包括:In a possible implementation manner of the first aspect, before the acquiring the first voice signal through the directional microphone of the electronic device, the method further includes:
获取当前的环境声强;Get the current environmental sound intensity;
若所述环境声强小于预设的声强阈值,则通过全向性麦克风获取第三语音信号,输出所述第三语音信号;If the environmental sound intensity is less than the preset sound intensity threshold, acquiring a third voice signal through an omnidirectional microphone, and outputting the third voice signal;
若所述环境声强大于或等于预设的声强阈值,则执行所述通过电子设备的指向性麦克风获取第一语音信号。If the ambient sound is stronger than or equal to the preset sound intensity threshold, performing the acquisition of the first voice signal through the directional microphone of the electronic device.
在第一方面的一种可能的实现方式中,所述通过电子设备的指向性麦克风获取第一语音信号,包括:In a possible implementation manner of the first aspect, the acquiring the first voice signal through the directional microphone of the electronic device includes:
通过所述电子设备的传动部件控制所述指向性麦克风在多个预设角度采集训练语音信号;Controlling the directional microphone to collect training voice signals at multiple preset angles through the transmission component of the electronic device;
根据各个所述训练语音信号中人声语音频段的平均幅值,确定目标角度;Determine the target angle according to the average amplitude of the human voice frequency band in each of the training speech signals;
控制所述传动部件将所述指向性麦克风调整至所述目标角度,并在所述目标角度下获取所述第一语音信号。The transmission component is controlled to adjust the directional microphone to the target angle, and the first voice signal is acquired at the target angle.
第二方面,本申请实施例提供了一种语音信号的采集装置,包括:In the second aspect, an embodiment of the present application provides a voice signal collection device, including:
第一语音信号获取单元,用于通过电子设备的指向性麦克风获取第一语音信号;The first voice signal acquiring unit is configured to acquire the first voice signal through the directional microphone of the electronic device;
语音信号补偿单元,用于根据信号补偿曲线对所述第一语音信号进行信号补偿,生成校准语音信号;A voice signal compensation unit, configured to perform signal compensation on the first voice signal according to a signal compensation curve to generate a calibrated voice signal;
校准语音信号输出单元,用于输出所述校准语音信号。The calibration voice signal output unit is used to output the calibration voice signal.
第三方面,本申请实施例提供了一种电子设备,存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现上述第一方面中任一项所述语音信号的采集方法。In a third aspect, an embodiment of the present application provides an electronic device, a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor executes the The computer program implements the voice signal collection method described in any one of the above-mentioned first aspects.
第四方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现上述第一方面中任一项所述语音信号的采集方法。In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium that stores a computer program, and is characterized in that, when the computer program is executed by a processor, any of the above-mentioned aspects of the first aspect is implemented. A method for collecting the voice signal.
第五方面,本申请实施例提供了一种计算机程序产品,当计算机程序产品在终端设备上运行时,使得终端设备执行上述第一方面中任一项所述语音信号的采集方法。In the fifth aspect, the embodiments of the present application provide a computer program product, which when the computer program product runs on a terminal device, causes the terminal device to execute the voice signal collection method described in any one of the above-mentioned first aspects.
可以理解的是,上述第二方面至第五方面的有益效果可以参见上述第一方面中的 相关描述,在此不再赘述。It can be understood that, for the beneficial effects of the second aspect to the fifth aspect described above, reference may be made to the relevant description in the first aspect described above, which will not be repeated here.
本申请实施例与现有技术相比存在的有益效果是:Compared with the prior art, the embodiments of this application have the following beneficial effects:
本申请实施例在通过指向性麦克风采集得到原始语音信号后,即第一语音信号,基于信号补偿曲线对第一语音信号进行信号补偿,得到校准语音信号,以提高高频段的语音信号的采集效果,降低声染的影响,并输出调整后的校准语音信号,能够在确保采集的语音信号中降低环境噪声基础上,提高语音采集的质量,从而增强了麦克风的信号输出效果。In the embodiment of the application, after the original voice signal is collected by the directional microphone, that is, the first voice signal, the signal compensation is performed on the first voice signal based on the signal compensation curve to obtain the calibrated voice signal, so as to improve the collection effect of the high-frequency voice signal , Reduce the impact of sound staining, and output the adjusted calibrated voice signal, which can improve the quality of voice collection on the basis of ensuring that the environmental noise in the collected voice signal is reduced, thereby enhancing the signal output effect of the microphone.
附图说明Description of the drawings
图1是与本申请实施例提供的智能耳机的部分结构的框图;FIG. 1 is a block diagram of a part of the structure of a smart headset provided by an embodiment of the present application;
图2是本申请第一实施例提供的一种语音信号的采集方法的实现流程图;FIG. 2 is an implementation flowchart of a voice signal collection method provided by the first embodiment of the present application;
图3是本申请一实施例提供的指向性麦克风的拾音区域;Fig. 3 is a sound pickup area of a directional microphone provided by an embodiment of the present application;
图4是本申请第一实施例提供的智能耳机的使用状态下的拾音示意图;FIG. 4 is a schematic diagram of sound pickup in the use state of the smart headset provided by the first embodiment of the present application;
图5是本申请第二实施例提供的一种语音信号的采集方法S201的具体实现流程图;5 is a specific implementation flowchart of a voice signal collection method S201 provided by the second embodiment of the present application;
图6是本申请一实施例提供的现有智能耳机的结构示意图;FIG. 6 is a schematic structural diagram of an existing smart headset provided by an embodiment of the present application;
图7是本申请一实施例提供的智能耳机的结构实体图;FIG. 7 is a structural entity diagram of a smart headset provided by an embodiment of the present application;
图8是本申请第三实施例提供的一种语音信号的采集方法S2012的具体实现流程图;FIG. 8 is a specific implementation flowchart of a voice signal collection method S2012 provided by the third embodiment of the present application;
图9是本申请第四实施例提供的一种语音信号的采集方法S202的具体实现流程图;9 is a specific implementation flowchart of a voice signal collection method S202 provided by the fourth embodiment of the present application;
图10是本申请第五实施例提供的一种语音信号的采集方法的具体实现流程图;10 is a specific implementation flowchart of a voice signal collection method provided by the fifth embodiment of the present application;
图11是本申请第六实施例提供的一种语音信号的采集方法S201的具体实现流程图;11 is a specific implementation flowchart of a voice signal collection method S201 provided by the sixth embodiment of the present application;
图12是本申请一实施例提供的一种语音信号的采集设备的结构框图;FIG. 12 is a structural block diagram of a voice signal collection device provided by an embodiment of the present application;
图13是本申请另一实施例提供的一种电子设备的示意图。FIG. 13 is a schematic diagram of an electronic device provided by another embodiment of the present application.
具体实施方式Detailed ways
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、技术之类的具体细节,以便透彻理解本申请实施例。然而,本领域的技术人员应当清楚,在没有这些具体细节的其它实施例中也可以实现本申请。在其它情况中,省略对众所周知的系统、装置、电路以及方法的详细说明,以免不必要的细节妨碍本申请的描述。In the following description, for the purpose of illustration rather than limitation, specific details such as a specific system structure and technology are proposed for a thorough understanding of the embodiments of the present application. However, it should be clear to those skilled in the art that the present application can also be implemented in other embodiments without these specific details. In other cases, detailed descriptions of well-known systems, devices, circuits, and methods are omitted to avoid unnecessary details from obstructing the description of this application.
应当理解,当在本申请说明书和所附权利要求书中使用时,术语“包括”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。It should be understood that when used in the specification and appended claims of this application, the term "comprising" indicates the existence of the described features, wholes, steps, operations, elements and/or components, but does not exclude one or more other The existence or addition of features, wholes, steps, operations, elements, components, and/or collections thereof.
还应当理解,在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。It should also be understood that the term "and/or" used in the specification and appended claims of this application refers to any combination of one or more of the items listed in the associated and all possible combinations, and includes these combinations.
如在本申请说明书和所附权利要求书中所使用的那样,术语“如果”可以依据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。类似地,短语“如果确定”或“如果检测到[所描述条件或事件]”可以依据上下文被解释为意指“一旦确定”或“响应于确定”或“一旦检测到[所描述条件或事件]”或“响应于检测到[所描述条件或事件]”。As used in the description of this application and the appended claims, the term "if" can be construed as "when" or "once" or "in response to determination" or "in response to detecting ". Similarly, the phrase "if determined" or "if detected [described condition or event]" can be interpreted as meaning "once determined" or "in response to determination" or "once detected [described condition or event]" depending on the context ]" or "in response to detection of [condition or event described]".
另外,在本申请说明书和所附权利要求书的描述中,术语“第一”、“第二”、 “第三”等仅用于区分描述,而不能理解为指示或暗示相对重要性。In addition, in the description of the specification of this application and the appended claims, the terms "first", "second", "third", etc. are only used to distinguish the description, and cannot be understood as indicating or implying relative importance.
在本申请说明书中描述的参考“一个实施例”或“一些实施例”等意味着在本申请的一个或多个实施例中包括结合该实施例描述的特定特征、结构或特点。由此,在本说明书中的不同之处出现的语句“在一个实施例中”、“在一些实施例中”、“在其他一些实施例中”、“在另外一些实施例中”等不是必然都参考相同的实施例,而是意味着“一个或多个但不是所有的实施例”,除非是以其他方式另外特别强调。术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”,除非是以其他方式另外特别强调。The reference to "one embodiment" or "some embodiments" described in the specification of this application means that one or more embodiments of this application include a specific feature, structure, or characteristic described in combination with the embodiment. Therefore, the sentences "in one embodiment", "in some embodiments", "in some other embodiments", "in some other embodiments", etc. appearing in different places in this specification are not necessarily All refer to the same embodiment, but mean "one or more but not all embodiments" unless it is specifically emphasized otherwise. The terms "including", "including", "having" and their variations all mean "including but not limited to", unless otherwise specifically emphasized.
本申请实施例提供的语音信号的采集方法可以应用于电子设备。该电子设备可以是任意具有语音采集功能的设备,包括但不限于具有语音采集功能的手机、平板电脑、可穿戴设备、车载设备、增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR)设备、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本、个人数字助理(personal digital assistant,PDA)等,还可以应用于数据库、服务器以及可以通过语音控制的系统,本申请实施例对电子设备的具体类型不作任何限制。本申请提供的语音信号的采集方法具体可以以应用程序或软件的形式存储于电子设备,电子设备通过执行该应用程序或软件,实现本申请提供的语音信号的采集方法。The voice signal collection method provided in the embodiments of the present application can be applied to electronic devices. The electronic device can be any device with a voice collection function, including but not limited to mobile phones, tablets, wearable devices, in-vehicle devices, augmented reality (AR)/virtual reality (VR) with voice collection functions. ) Devices, notebook computers, ultra-mobile personal computers (UMPC), netbooks, personal digital assistants (personal digital assistants, PDAs), etc., can also be applied to databases, servers, and systems that can be controlled by voice. The application embodiments do not impose any restrictions on the specific types of electronic devices. The voice signal collection method provided in this application can be specifically stored in an electronic device in the form of an application program or software, and the electronic device implements the voice signal collection method provided in this application by executing the application program or software.
作为示例而非限定,所述电子设备还可以为可穿戴设备,该可穿戴设备可以是应用穿戴式技术对日常穿戴进行智能化设计、开发出可以穿戴的设备的总称,如有线耳机、无线耳机、带有话筒功能的耳机、眼镜、手套、手表、服饰及鞋等。可穿戴设备即直接穿在身上,或是整合到用户的衣服或配件的一种便携式设备,通过附着与用户身上,采集用户的语音信号。可穿戴设备不仅仅是一种硬件设备,更是通过软件支持以及数据交互、云端交互来实现强大的功能。广义穿戴式智能设备包括功能全、尺寸大、可不依赖智能手机实现完整或者部分的功能,如智能耳机或智能话筒等,以及只专注于某一类应用功能,需要和其它设备如智能手机配合使用,如各类能够采集用户语音信号的智能手环、智能首饰等。As an example and not a limitation, the electronic device may also be a wearable device. The wearable device may be a general term for using wearable technology to intelligently design daily wear and develop wearable devices, such as wired earphones and wireless earphones. , Earphones, glasses, gloves, watches, clothing and shoes with microphone function. A wearable device is a portable device that is directly worn on the body or integrated into the user's clothes or accessories, and is attached to the user's body to collect the user's voice signal. Wearable devices are not only a kind of hardware device, but also realize powerful functions through software support, data interaction, and cloud interaction. In a broad sense, wearable smart devices include full-featured, large-sized, complete or partial functions that can be realized without relying on smart phones, such as smart headphones or smart microphones, and only focus on a certain type of application function, and need to be used in conjunction with other devices such as smart phones. , Such as all kinds of smart bracelets and smart jewelry that can collect user voice signals.
以所述电子设备为智能耳机为例。图1示出的是与本申请实施例提供的智能耳机的部分结构的框图。参考图1,智能耳机包括:近场通信模块110、存储器120、输入单元130、传感器140、音频电路150、处理器160以及电源170等部件。本领域技术人员可以理解,图1中示出的智能耳机结构并不构成对智能耳机的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。Take, for example, that the electronic device is a smart headset. FIG. 1 shows a block diagram of a part of the structure of a smart headset provided by an embodiment of the present application. 1, the smart headset includes: a near field communication module 110, a memory 120, an input unit 130, a sensor 140, an audio circuit 150, a processor 160, a power supply 170 and other components. Those skilled in the art can understand that the structure of the smart headset shown in FIG. 1 does not constitute a limitation on the smart headset, and may include more or less components than those shown in the figure, or combine some components, or arrange different components.
下面结合图1对智能耳机的各个构成部件进行具体的介绍:The following describes the components of the smart headset in detail with reference to Figure 1:
智能耳机可以通过近场通信模块110可以接收与关联的用户终端建立通信连接,例如与用户的手机建立通信,通过近场通信模块110接收用户手机发送的音频信号,并通过音频电路150中的扬声器151对音频信号进行播放,以及可以通过近场通信模块110将采集到的语音信号发送给关联的用户终端,举例性地,用户在使用用户终端进行语音通话时,可以通过智能耳机采集语音信号,此时,智能耳机可以通过近场通信模块110与用户终端建立实时通信链路,并通过音频电路150中的传声器152采集用户的语音信号,并通过实时通信链路将语音信号反馈给用户终端,并发送给通信对端。举例性地,该近场通信模块110可以为蓝牙通信模块、低功耗蓝牙通信模块又或 者WiFi通信模块等。The smart headset can receive the communication connection with the associated user terminal through the near field communication module 110, for example, establish communication with the user’s mobile phone, receive the audio signal sent by the user’s mobile phone through the near field communication module 110, and pass the speaker in the audio circuit 150 151 plays the audio signal, and can send the collected voice signal to the associated user terminal through the near field communication module 110. For example, when the user uses the user terminal to make a voice call, the voice signal can be collected through a smart headset, At this time, the smart headset can establish a real-time communication link with the user terminal through the near field communication module 110, collect the user’s voice signal through the microphone 152 in the audio circuit 150, and feed the voice signal back to the user terminal through the real-time communication link. And send to the correspondent end. For example, the near field communication module 110 may be a Bluetooth communication module, a Bluetooth low energy communication module, or a WiFi communication module.
存储器120可用于存储软件程序以及模块,处理器160通过运行存储在存储器120的软件程序以及模块,从而执行智能耳机的各种功能应用以及数据处理,例如将接收到的语音信号以及预配置的信号补偿曲线等存储于存储器120内。存储器120可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、声音采集功能等)等;存储数据区可存储根据智能耳机的使用所创建的数据(比如音频数据等)等。此外,存储器120可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。The memory 120 may be used to store software programs and modules. The processor 160 executes various functional applications and data processing of the smart headset by running the software programs and modules stored in the memory 120, for example, the received voice signals and pre-configured signals The compensation curve and the like are stored in the memory 120. The memory 120 may mainly include a storage program area and a storage data area. The storage program area may store an operating system, an application program required by at least one function (such as a sound playback function, a sound collection function, etc.), etc.; the storage data area may store data according to Data (such as audio data, etc.) created by the use of smart headphones, etc. In addition, the memory 120 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices.
输入单元130可用于接收输入的数字或字符信息,以及产生与智能耳机100的用户设置以及功能控制有关的键信号输入。具体地,输入单元130可包括控制按键131以及其他输入设备132。举例性地,控制按键131,可以为称为接触式按键,在用户触摸对应区域时,即可生成与该位置相对应的键信号输入,无需用户进行按压,也可以达到智能耳机外表面一体成型的目的,该接触式按键可收集用户在其上或附近的触摸操作(比如用户使用手指或附件在控制按键131上或在控制按键131附近的操作),并根据预先设定的程式驱动相应的连接装置。具体地,用户可以通过输入单元130生成用于控制终端语音信号的采集指令,并通过近场通信模块110将语音信号发送给用户终端。The input unit 130 may be used to receive input digital or character information, and generate key signal input related to user settings and function control of the smart headset 100. Specifically, the input unit 130 may include a control button 131 and other input devices 132. For example, the control button 131 may be a so-called contact button. When the user touches the corresponding area, the key signal input corresponding to the position can be generated, without the user pressing, and the outer surface of the smart headset can be integrated. The purpose of this touch button can collect the user's touch operation on or near it (for example, the user uses a finger or accessory to operate on or near the control button 131), and drive the corresponding according to the preset program Connect the device. Specifically, the user may generate a collection instruction for controlling the voice signal of the terminal through the input unit 130, and send the voice signal to the user terminal through the near field communication module 110.
智能耳机100还可包括至少一种传感器140,比如光传感器、运动传感器以及其他传感器。具体地,光传感器可包括环境光传感器及接近传感器,其中,环境光传感器可根据环境光线的明暗来判断是否存在触控操作,接近传感器可在智能耳机的两个挂耳部件之间的距离大于预设的距离阈值时,自动生成声音播放暂停指令,或者语音信号采集的暂停指令。作为运动传感器的一种,加速计传感器可检测各个方向上(一般为三轴)加速度的大小,静止时可检测出重力的大小及方向,可用于识别智能耳机姿态的应用、振动识别相关功能(敲击)等;至于智能耳机还可配置的红外线传感器等其他传感器,在此不再赘述。The smart headset 100 may also include at least one sensor 140, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor and a proximity sensor. The ambient light sensor can determine whether there is a touch operation according to the brightness of the ambient light. The proximity sensor can be greater than the distance between the two ear parts of the smart headset. When the preset distance threshold is set, a pause command for sound playback or a pause command for voice signal collection is automatically generated. As a kind of motion sensor, the accelerometer sensor can detect the magnitude of acceleration in various directions (usually three-axis), and can detect the magnitude and direction of gravity when it is stationary. It can be used to recognize the application of smart headset gestures and vibration recognition related functions ( Tap), etc.; as for the infrared sensor and other sensors that can be configured in the smart headset, I will not repeat them here.
音频电路150、扬声器151,传声器152可提供用户与智能耳机之间的音频接口。音频电路150可将接收到的音频数据转换后的电信号,传输到扬声器151,由扬声器151转换为声音信号输出;另一方面,传声器152将收集的声音信号转换为电信号,由音频电路150接收后转换为音频数据,再将音频数据输出处理器160处理后,经近场通信模块110以发送给比如用户终端,或者将音频数据输出至存储器120以便进一步处理。The audio circuit 150, the speaker 151, and the microphone 152 can provide an audio interface between the user and the smart headset. The audio circuit 150 can transmit the electric signal converted from the received audio data to the speaker 151, and the speaker 151 converts it into a sound signal for output; on the other hand, the microphone 152 converts the collected sound signal into an electric signal, which is then output by the audio circuit 150. After being received, it is converted into audio data, and then processed by the audio data output processor 160, and then sent to, for example, a user terminal through the near field communication module 110, or the audio data is output to the memory 120 for further processing.
处理器160是智能耳机的控制中心,利用各种接口和线路连接整个智能耳机的各个部分,通过运行或执行存储在存储器120内的软件程序和/或模块,以及调用存储在存储器120内的数据,执行智能耳机的各种功能和处理数据,从而对智能耳机进行整体监控。可选的,处理器160可包括一个或多个处理单元;优选的,处理器160可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器160中。The processor 160 is the control center of the smart headset. It uses various interfaces and lines to connect the various parts of the entire smart headset, by running or executing software programs and/or modules stored in the memory 120, and calling data stored in the memory 120 , To perform various functions of the smart headset and process data, so as to monitor the smart headset as a whole. Optionally, the processor 160 may include one or more processing units; preferably, the processor 160 may integrate an application processor and a modem processor, where the application processor mainly processes the operating system, user interface, application programs, etc. , The modem processor mainly deals with wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 160.
智能耳机100还包括给各个部件供电的电源170(比如电池),优选的,电源可以通过电源管理系统与处理器160逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。The smart headset 100 also includes a power source 170 (such as a battery) for supplying power to various components. Preferably, the power source can be logically connected to the processor 160 through a power management system, so that functions such as charging, discharging, and power consumption management can be managed through the power management system. .
作为示例而非限定,该语音信号的采集装置可以为一智能麦克风。该智能麦克风可以通过本申请提供的实施例提供的语音信号的采集方法对获取到的语音信号进行处理,并将处理后的语音信号执行后续的操作,例如通过扬声器进行播放,或者将采集到的语音信息存储至存储器内进行保存。As an example and not a limitation, the voice signal collection device may be a smart microphone. The smart microphone can process the acquired voice signal through the voice signal collection method provided in the embodiments provided in this application, and perform subsequent operations on the processed voice signal, such as playing it through a speaker, or retrieving the collected voice signal. The voice information is stored in the memory for preservation.
作为示例性而非限定,该语音信号的采集装置可以为一沉浸式VR头套。该VR头套可以通过内置的指向性麦克风采集用户的语音信号,可以通过内置的扬声器模块播放该语音信号,并对采集后的语音信号进行信号解析,转换为字符数据,并根据字符数据调整VR头套内显示单元的输出内容。As an example and not a limitation, the voice signal collection device may be an immersive VR headset. The VR headgear can collect the user's voice signal through the built-in directional microphone, play the voice signal through the built-in speaker module, analyze the collected voice signal, convert it into character data, and adjust the VR headgear according to the character data The output content of the internal display unit.
在本申请实施例中,流程的执行主体为电子设备。作为示例而非限定,该电子设备具体可以为一智能耳机,特别地,该智能耳机为真无线(True Wireless Stereo,TWS)耳机。图2示出了本申请第一实施例提供的语音信号的采集方法的实现流程图,详述如下:In the embodiment of the present application, the execution subject of the process is an electronic device. As an example and not a limitation, the electronic device may be a smart headset, in particular, the smart headset is a true wireless (True Wireless Stereo, TWS) headset. Fig. 2 shows an implementation flowchart of the voice signal collection method provided by the first embodiment of the present application, and the details are as follows:
在S201中,通过电子设备的指向性麦克风获取第一语音信号。In S201, the first voice signal is acquired through the directional microphone of the electronic device.
在本实施例中,智能耳机内可以配置有一指向性麦克风,该指向性麦克风可以有效获取预设方向上的声音信号,即该指向性麦克风可以向预设的拾音方向进行延伸,具体的拾音范围以拾音方向为基准进行延伸的类心型区域。当所需采集的目标信号处于拾音范围的心型延伸区域方向上,采集的声音信号的信号质量最优,而在拾音方向的反方向,则采集到的声音信号的信号质量较差。图3示出了本申请提供的一指向性麦克风的拾音区域。参见图3所示,该指向性麦克风的拾音方向为0°对应的方向。即在拾音方向上的语音信号具有较好的采集指令;而该指向性麦克风的拾音方向的反方向即为180°对应的方向,即在180°的方向性采集的语音信号的采集效果较差。其中,图3内存在多个圆环,每个圆环代表该采集灵敏度,中心原点的采集灵敏度为-30分贝(db),即在拾音方向上,声音信号的幅度只需大于30db,即可激活指向性麦克风并记录对应的声音信号。而在指向性麦克风的拾音方向的反方向,即180°在较大的声音幅度值下,例如-2db下,仍然无法进行语音信号的记录。由此可见,随着声音信号与指向性麦克风的拾音方向之间的角度逐渐增大,其信号的采集质量会依次降低,当两者之间的角度为180°时,即两者的传播方向相反时,其信号的采集指令最差。In this embodiment, the smart headset may be equipped with a directional microphone, which can effectively acquire sound signals in a preset direction, that is, the directional microphone can extend to the preset sound pickup direction. The sound range is a cardioid region that extends based on the pickup direction. When the target signal to be collected is in the direction of the cardioid extension area of the pickup range, the signal quality of the collected sound signal is the best, while in the opposite direction of the pickup direction, the signal quality of the collected sound signal is poor. Fig. 3 shows the sound pickup area of a directional microphone provided by the present application. As shown in Figure 3, the pickup direction of the directional microphone is the direction corresponding to 0°. That is, the voice signal in the pickup direction has better collection instructions; the opposite direction of the pickup direction of the directional microphone is the direction corresponding to 180°, that is, the collection effect of the voice signal collected at the directionality of 180° Poor. Among them, there are multiple rings in Figure 3, each ring represents the collection sensitivity, the collection sensitivity of the center origin is -30 decibels (db), that is, in the pickup direction, the amplitude of the sound signal only needs to be greater than 30db, that is The directional microphone can be activated and the corresponding sound signal can be recorded. However, in the opposite direction of the sound pickup direction of the directional microphone, that is, 180°, at a larger sound amplitude value, for example, -2db, it is still impossible to record the voice signal. It can be seen that as the angle between the sound signal and the pickup direction of the directional microphone gradually increases, the signal collection quality will gradually decrease. When the angle between the two is 180°, that is, the propagation of the two When the direction is opposite, the signal acquisition command is the worst.
在一种可能的实现方式中,开发人员在设计智能耳机内指向性麦克风的安装位置时,可以根据智能耳机在使用状态下与用户嘴部之间的相互位置关系,确定指向性麦克风的拾音方向,保证拾音方向与用户的发声方向保持一致,从而能够提高获取用户语音信号时的信号采集质量,以便提升录音效果。图4示出了本申请一实施例提供的智能耳机的使用状态下的拾音示意图。参见图4所示,智能耳机在处于使用状态时,其所在平面与用户嘴部所在平面相互平行,用户发声方向垂直于用户嘴部所在平面,因此,在设计智能耳机的指向性麦克风的拾音方向时,也可以保持其拾音方向垂直于使用状态下所对应的平面,从而保证了拾音方向与用户嘴部的声音方向一致,能够更清楚地采集用户的人声信号。In a possible implementation, when designing the installation position of the directional microphone in the smart headset, the developer can determine the pickup of the directional microphone according to the mutual position relationship between the smart headset and the user's mouth in the use state Direction, to ensure that the sound pickup direction is consistent with the user’s utterance direction, which can improve the signal collection quality when acquiring the user’s voice signal, so as to improve the recording effect. Fig. 4 shows a schematic diagram of sound pickup in a use state of a smart headset provided by an embodiment of the present application. As shown in Figure 4, when the smart headset is in use, its plane is parallel to the plane of the user's mouth, and the user's sound direction is perpendicular to the plane of the user's mouth. Therefore, the sound pickup of the directional microphone of the smart headset is designed In the direction, the sound pickup direction can also be kept perpendicular to the corresponding plane in the use state, thereby ensuring that the sound pickup direction is consistent with the sound direction of the user's mouth, and the user's vocal signal can be collected more clearly.
在一种可能的实现方式中,智能耳机可以配置有信号采集触发阈值,以及采集结束判断阈值。智能耳机内置的指向性麦克风会检测当前场景下的声音信号的声音振幅,若检测到该声音振幅大于商户的采集触发阈值,则会开始记录当前环境的声音信号,从而生成第一语音信号;在记录的过程中,若检测到某一时间段内第一语音信号的声音振幅均小于上述的采集结束判断阈值,则可以停止采集当前环境的声音信号。需要说明的是,信号采集触发阈值的数值大于上述的采集结束判断阈值。由于用户在开始录音时,往往声音较大,因此可以适当提高信号采集触发阈值的数值,以减少误采集的情况发生;而在录音过程中,由于人说话过程中存在弱音、中顿等情况,部分时候声音幅值较小,但并不代表用户说话已中断,因此可以适当降低采集结束判断阈值,从而能够保证连续话语均录制到同一语音信号内,从而提高了语音信号内容的连续性。In a possible implementation manner, the smart headset may be configured with a signal collection trigger threshold and a collection end judgment threshold. The built-in directional microphone of the smart headset will detect the sound amplitude of the sound signal in the current scene. If it detects that the sound amplitude is greater than the merchant’s collection trigger threshold, it will start to record the sound signal of the current environment to generate the first voice signal; During the recording process, if it is detected that the sound amplitude of the first voice signal within a certain period of time is less than the above-mentioned collection end judgment threshold, the collection of the sound signal of the current environment can be stopped. It should be noted that the value of the signal acquisition trigger threshold is greater than the aforementioned acquisition end judgment threshold. Since users often have loud voices when they start recording, the value of the signal acquisition trigger threshold can be appropriately increased to reduce the occurrence of false acquisitions; and during the recording process, due to the weak tone and pause during the person’s speech, Sometimes the sound amplitude is small, but it does not mean that the user's speech has been interrupted. Therefore, the threshold for judging the end of the collection can be appropriately lowered to ensure that the continuous speech is recorded in the same voice signal, thereby improving the continuity of the voice signal content.
在一种可能的实现方式中,智能耳机可以配置有采集触发条件,在智能耳机与用户终端相连的使用状态,获取该用户终端当前使用的应用列表。若任一应用列表中包含预设的声音采集应用,例如视频通话应用、语音通话应用等存在录音相关的应用程序,则可以识别当前满足预设的采集触发条件,并向智能耳机发送一个语音采集指令,以通过智能耳机获取第一语音信号。In a possible implementation manner, the smart headset may be configured with a collection trigger condition, and a list of applications currently used by the user terminal can be obtained in a use state in which the smart headset is connected to the user terminal. If any application list contains preset sound collection applications, such as video call applications, voice call applications, and other recording-related applications, it can recognize that the preset collection trigger conditions are currently met, and send a voice collection to the smart headset Instructions to obtain the first voice signal through the smart headset.
在本实施例中,智能耳机在满足预设的采集条件时,可以通过指向性麦克风采集当前场景下的第一语音信号。可选地,在获取得到第一语音信号后,可以通过预设的滤波器对语音信号进行噪声过滤,特别地,该滤波器为高频滤波器,以过滤高频段的噪声信号。由于指向性麦克风的低频敏感度低,低频噪声难被接收到,即相当于指向性麦克风在采集声音信号时已经进行了低切,天然的降低频噪声的采集能力。基于此,为了进一步抑制高频噪声,可以在信号采集后串联一个高频滤波器,以对高频段的噪声信号进行过滤。In this embodiment, when the smart headset meets the preset collection conditions, the first voice signal in the current scene can be collected through the directional microphone. Optionally, after the first voice signal is obtained, the voice signal may be noise-filtered through a preset filter. In particular, the filter is a high-frequency filter to filter high-frequency noise signals. Due to the low-frequency sensitivity of the directional microphone, low-frequency noise is difficult to receive, which is equivalent to the low-cut of the directional microphone when collecting sound signals, which naturally reduces the ability of collecting frequency noise. Based on this, in order to further suppress high-frequency noise, a high-frequency filter can be connected in series after signal acquisition to filter high-frequency noise signals.
在S202中,根据信号补偿曲线对所述第一语音信号进行信号补偿,生成校准语音信号。In S202, signal compensation is performed on the first voice signal according to the signal compensation curve to generate a calibration voice signal.
在本实施例中,智能耳机在获取得到第一语音信号后,可以通过预设的信号补偿曲线对第一语音信号进行信号补偿。由于指向性麦克风在非轴向入射方向上以及频响曲线不够平滑的不够友好问题,会导致第一语音信号在高频段存在采集质量较低的情况,在语音信号的波形上反映下来即为频响在中高频能量会较低。基于此,智能耳机可以在采集到第一语音信号后,通过信号补偿曲线对指向性麦克风的采集缺陷进行信号补偿,从而得到补偿后的校准语音信号,从而能够优化指向性麦克风的信号采集效果。In this embodiment, after acquiring the first voice signal, the smart headset may perform signal compensation on the first voice signal through a preset signal compensation curve. Because the directional microphone is not friendly enough in the non-axial incident direction and the frequency response curve is not smooth enough, it will cause the first voice signal to have a low acquisition quality in the high frequency band, which is reflected in the waveform of the voice signal. The energy will be lower at mid and high frequencies. Based on this, the smart headset can compensate the collection defect of the directional microphone through the signal compensation curve after the first voice signal is collected, so as to obtain a compensated calibrated voice signal, thereby optimizing the signal collection effect of the directional microphone.
在一种可能的实现方式中,配置信号补偿曲线的方式可以为:智能耳机在出厂时,可以通过指向性麦克风获取在多个不同的角度上多个训练语音的训练波形。基于多个训练波形,确定在高频段对应的补偿系数,从而可以生成信号补偿曲线。特别地,若指向性麦克风在不同的声音采集角度上对应的采集效果差异较大,在该情况下,可以根据在各个角度下对应的训练波形,分别为不同的声音采集角度配置对应的信号补偿曲线。在该情况下,智能耳机在采集到第一语音信号时,可以通过读取运动传感器反馈的参数,确定智能耳机的当前姿态,并基于该姿态与正常使用状态下的偏差角度,估算出用户人声与之拾音方向之间的夹角,并基于该夹角确定对应的信号补偿曲线, 通过信号补偿曲线对第一语音信号进行信号补偿。In a possible implementation manner, the manner of configuring the signal compensation curve may be: when the smart headset is shipped from the factory, the training waveform of multiple training voices at multiple different angles can be obtained through a directional microphone. Based on multiple training waveforms, the compensation coefficient corresponding to the high frequency band is determined, so that the signal compensation curve can be generated. In particular, if the corresponding collection effects of directional microphones at different sound collection angles are quite different, in this case, you can configure corresponding signal compensation for different sound collection angles according to the corresponding training waveforms at each angle. curve. In this case, when the smart headset collects the first voice signal, the current posture of the smart headset can be determined by reading the parameters fed back by the motion sensor, and based on the deviation angle of the posture from the normal use state, the user can be estimated The included angle between the sound and the sound pickup direction, and the corresponding signal compensation curve is determined based on the included angle, and signal compensation is performed on the first voice signal through the signal compensation curve.
在一种可能的实现方式中,智能耳机可以与用户终端建立通信连接,在智能耳机通过指向性麦克风采集语音信号时,可以通过用户终端内置的麦克风模块同步获取参考语音信号,通过参考语音信号获取在低频段、中频段以及高频段之间的幅值比例,并基于所述幅值比例生成信号补偿曲线。在该情况下,智能耳机在接收到用户终端反馈的参考语音信号时,可以检测该参考语音信号的平均幅值,若该参考语音信号的平均幅值小于预设的参考阈值,则识别该参考语音信号为无效语音,通过内置的信号补偿曲线对第一语音信号进行信号补偿。由于在使用智能耳机时,用户终端可以放在口袋或书包等位置,此时信号采集效果较差,即便通过用户终端采集得到参考语音信号,也无法对第一语音信号在高频段具有较好的校准作用,在该情况下,则可以通过预设的信号补偿曲线对第一语音信号进行信号补偿。In a possible implementation, the smart headset can establish a communication connection with the user terminal. When the smart headset collects voice signals through a directional microphone, the reference voice signal can be synchronously acquired through the microphone module built in the user terminal, and the reference voice signal can be acquired through the reference voice signal. The amplitude ratio between the low frequency band, the middle frequency band, and the high frequency band, and a signal compensation curve is generated based on the amplitude ratio. In this case, when the smart headset receives the reference voice signal fed back by the user terminal, it can detect the average amplitude of the reference voice signal, and if the average amplitude of the reference voice signal is less than the preset reference threshold, then identify the reference The voice signal is invalid voice, and the first voice signal is signal compensated through the built-in signal compensation curve. Since the user terminal can be placed in a pocket or schoolbag when using a smart headset, the signal collection effect is poor at this time. Even if the reference voice signal is collected through the user terminal, it cannot have a good performance on the first voice signal in the high frequency band. The calibration function, in this case, the signal compensation for the first voice signal can be performed through the preset signal compensation curve.
在S203中,输出所述校准语音信号。In S203, the calibration voice signal is output.
在本实施例中,智能耳机在对第一语音信号进行信号补偿并得到校准语音信号后,可以输出该校准语音信号,具体校准语音信号的输出方式可以根据采集操作的触发方式决定。举例性地,若用户终端在执行视频通话操作,并通过智能耳机采集用户在通话过程中的语音信号,则智能耳机在获取到校准语音信号后,可以通过近场通信模块向用户终端发送校准语音信号,以通过用户终端将校准语音信号发送给通话对端,以提供视频通话过程中的语音信号;若用户通过语音信号来对智能耳机或用户终端进行语音控制,则智能耳机在获取得到语音信号后,可以对该校准语音信号进行解析,将校准语音信号转换为字符数据,基于字符数据内包含的关键词,确定校准语音信号关联的控制指令,基于控制指令对智能耳机或用户终端进行控制。In this embodiment, the smart headset can output the calibration voice signal after signal compensation is performed on the first voice signal and the calibration voice signal is obtained. The specific output mode of the calibration voice signal may be determined according to the trigger mode of the collection operation. For example, if the user terminal is performing a video call operation and collects the user's voice signal during the call through the smart headset, after the smart headset obtains the calibration voice signal, it can send the calibration voice to the user terminal through the near field communication module Signal to send the calibrated voice signal to the call peer through the user terminal to provide the voice signal during the video call; if the user uses the voice signal to perform voice control on the smart headset or the user terminal, the smart headset is acquiring the voice signal Later, the calibration voice signal can be analyzed, the calibration voice signal can be converted into character data, the control instructions associated with the calibration voice signal can be determined based on the keywords contained in the character data, and the smart headset or user terminal can be controlled based on the control instructions.
在一种可能的实现方式中,智能耳机可以用于响应录音请求,智能耳机在生成校准语音信号后,可以将语音信号存储于本地的存储器内,或者通过通信模块,例如近场通信模块或者射频通信模块发送给用户终端或上传至云端服务器,通过其他终端来对校准语音信号进行保存。In a possible implementation, the smart headset can be used to respond to the recording request. After the smart headset generates the calibration voice signal, the voice signal can be stored in a local memory, or through a communication module, such as a near field communication module or radio frequency. The communication module is sent to the user terminal or uploaded to the cloud server, and the calibration voice signal is saved through other terminals.
在一种可能的实现方式中,智能耳机可以连接有扬声装置,智能耳机通过有线链路或无线通信链路将校准语音信号发送给扬声装置,通过扬声装置输出校准语音信号,从而能够提高语音信号的播放效果,减少噪声的影响,并且使得语音信号更加接近用户原始声音。In a possible implementation manner, the smart headset may be connected with a speaker device, the smart headset sends the calibration voice signal to the speaker device through a wired link or a wireless communication link, and the calibration voice signal is output through the speaker device, thereby enabling Improve the playback effect of the voice signal, reduce the impact of noise, and make the voice signal closer to the user's original voice.
以上可以看出,本申请实施例提供的一种语音信号的采集方法在通过指向性麦克风采集得到原始语音信号后,即第一语音信号,基于信号补偿曲线对第一语音信号进行信号补偿,得到校准语音信号,以提高高频段的语音信号的采集效果,降低声染的影响,并输出调整后的校准语音信号,能够在确保采集的语音信号中降低环境噪声基础上,提高语音采集的质量,从而增强了麦克风的信号输出效果。It can be seen from the above that the voice signal collection method provided by the embodiment of the present application collects the original voice signal through the directional microphone, that is, the first voice signal, and performs signal compensation on the first voice signal based on the signal compensation curve to obtain Calibrate the voice signal to improve the collection effect of high-frequency voice signals, reduce the impact of sound staining, and output the adjusted calibrated voice signal, which can improve the quality of voice collection while ensuring that the collected voice signal reduces environmental noise. Thereby enhancing the signal output effect of the microphone.
图5示出了本申请第二实施例提供的一种语音信号的采集方法S201的具体实现流程图。参见图5,相对于图2所述实施例,本实施例提供的一种语音信号的采集方法中S201包括:S2011~S2012,具体详述如下:FIG. 5 shows a specific implementation flowchart of a voice signal collection method S201 provided by the second embodiment of the present application. Referring to FIG. 5, with respect to the embodiment described in FIG. 2, S201 in a voice signal collection method provided in this embodiment includes: S2011 to S2012, which are detailed as follows:
在S2011中,在获取所述第一语音信号时,通过所述电子设备的全向性麦克风采集所述第一语音信号对应的第二语音信号。In S2011, when the first voice signal is acquired, a second voice signal corresponding to the first voice signal is collected through an omnidirectional microphone of the electronic device.
在本实施例中,智能耳机配置有两个麦克风模块,分别为全向性麦克以及指向性麦克风,上述两个麦克风模块共同构成智能耳机的麦克风阵列。智能耳机在采集语音信号时,可以同时开启全向性麦克风以及指向性麦克风,通过指向性麦克风采集第一语音信号的同时,也会通过全向性麦克风采集第二语音信号,即第一语音信号的起始录音时刻与第二语音信号的起始录音时刻相同,两个麦克风同时采集当前环境下的语音信号,只是通过不同的麦克风模块采集得到。其中,第二语音信号主要作为参考信号,对第一语音信号进行调整。优选地,该智能耳机的左侧耳机部件以及右侧耳机部件均可以配置一个指向性麦克风以及一个全向性麦克风。In this embodiment, the smart headset is configured with two microphone modules, which are an omnidirectional microphone and a directional microphone, respectively, and the above two microphone modules together form the microphone array of the smart headset. When the smart headset collects voice signals, it can turn on the omnidirectional microphone and the directional microphone at the same time. While collecting the first voice signal through the directional microphone, it also collects the second voice signal through the omnidirectional microphone, that is, the first voice signal. The initial recording time of the second voice signal is the same as that of the second voice signal. The two microphones collect the voice signal in the current environment at the same time, which is only collected through different microphone modules. Among them, the second voice signal is mainly used as a reference signal to adjust the first voice signal. Preferably, both the left earphone part and the right earphone part of the smart earphone can be configured with a directional microphone and an omnidirectional microphone.
由于全向性麦克风拾取的第二语音信号包含的环境噪声远大于指向性麦克风拾取到的第一语音信号的环境噪声,即第二语音信号的信噪比远低于第一语音信号的信噪比,因此在输出语音信号时,可以对第一语音信号作为基准进行输出,从而提高输出语音信号的信噪比。但是指向性麦克由于声音信号通过非拾音方向进行采集时以及本身的频响曲线不够平滑的先天不够友好问题,会导致中高频信号会有损失,从而第一语音信号的中高频能量会比全向性麦克风输出的第二语音信号的中高频能量的保真能力差。因此可以同时调用全向性麦克风采集第二语音信号,以弥补指向性麦克风在中高频段的采集缺陷,从而提高了采集效率。Because the second voice signal picked up by the omnidirectional microphone contains much greater environmental noise than the first voice signal picked up by the directional microphone, the signal-to-noise ratio of the second voice signal is much lower than that of the first voice signal Therefore, when the voice signal is output, the first voice signal can be used as a reference for output, thereby improving the signal-to-noise ratio of the output voice signal. However, the directional microphone is not friendly enough when the sound signal is collected in the non-pickup direction and its frequency response curve is not smooth enough, which will cause the loss of the medium and high frequency signal, so that the medium and high frequency energy of the first voice signal will be more than the full The medium and high frequency energy of the second voice signal output by the directional microphone has poor fidelity. Therefore, the omnidirectional microphone can be called to collect the second voice signal at the same time to make up for the collection defect of the directional microphone in the middle and high frequency bands, thereby improving the collection efficiency.
在一种可能的实现方式中,由于智能耳机存在左侧耳机挂件以及右侧耳机挂件,通过两侧耳机挂件共同输出音频以提供立体声。因此,在使用的过程中,用户会同时佩戴左侧耳机挂件以及右侧耳机挂件。在该情况下,可以将指向性麦克风安装于其中一侧耳机挂件上,而全向性麦克风则安装于另一侧耳机挂件上,从而能够同时安装两个麦克风的同时,减少智能耳机的体积,特别对于TWS耳机,其便携性直接影响用户的使用体验,因此,通过将指向性麦克风以及全向性麦克风部署于耳机的两侧,可以提高耳机的空间利用率。例如,当指向性麦克风安装于左侧耳机挂件时,全向性麦克风可以安装于右侧耳机挂件;反之,当指向性麦克风安装于右侧耳机挂件时,全向性麦克风可以安装于右侧耳机挂件。In a possible implementation manner, since the smart earphone has a left earphone pendant and a right earphone pendant, the two earphone pendants jointly output audio to provide stereo. Therefore, during use, the user will wear the left earphone pendant and the right earphone pendant at the same time. In this case, the directional microphone can be installed on the earphone pendant on one side, and the omnidirectional microphone can be installed on the earphone pendant on the other side, so that two microphones can be installed at the same time while reducing the size of the smart headset. Especially for TWS headsets, its portability directly affects the user experience. Therefore, by deploying directional microphones and omnidirectional microphones on both sides of the headset, the space utilization of the headset can be improved. For example, when the directional microphone is installed on the left earphone hanger, the omnidirectional microphone can be installed on the right earphone hanger; conversely, when the directional microphone is installed on the right earphone hanger, the omnidirectional microphone can be installed on the right earphone hanger. Pendant.
进一步地,作为本申请的另一实施例,在所述电子设备采集语音信号时,所述指向性麦克风与人声声源之间的第一距离小于所述全向性麦克风与所述人声声源之间的第二距离。由于指向性麦克风采集的第一语音信号作为基准信号,而全向性麦克风采集的第二语音信号作为参考信号。因此,为了提高基准信号的人声采集质量,可以将指向性麦克风设置于靠近用户嘴部的位置,而相对地,全向性麦克风可以设置于较为远离用户嘴部的位置。Further, as another embodiment of the present application, when the electronic device collects voice signals, the first distance between the directional microphone and the human voice source is smaller than the omnidirectional microphone and the human voice The second distance between sound sources. Because the first voice signal collected by the directional microphone is used as the reference signal, and the second voice signal collected by the omnidirectional microphone is used as the reference signal. Therefore, in order to improve the human voice collection quality of the reference signal, the directional microphone can be set close to the user's mouth, while the omnidirectional microphone can be set relatively far away from the user's mouth.
示例性地,图6示出了本实施例提供的一现有智能耳机的结构示意图。参见图6所示,该智能耳机具体为一真无线耳机,该真无线耳机上配置有两个麦克风模块,且两个麦克风模块均为全向性麦克风模块,分别为全向性麦克风1以及全向性麦克风2。TWS耳机相比于有线耳机而言,在用户使用的角度上而言,能够有效地挣脱线材束缚,能够提高用户的佩戴时的使用体验。而现有的智能设备技术,耳机除了为用户提供音频输出服务外,往往需要同时承担音频采集服务,但由于TWS耳机在使用状态下,需要挂在用户耳部上进行拾音,且不存在其他拾音部件,从而导致相比于有线耳机而言,用户无法直接将耳机上的麦克对准嘴部执行拾音,且TWS耳机佩戴位置与用户的嘴部 距离较远,使得TWS耳机上的拾音质量远不如近距离拾音的有线耳机。且如图所示,现有的TWS耳机上往往配置的是两个全向性麦克风,而由于全向性麦克风可感受所有方向上的声压变化,以相等的灵敏度接收来自各方向的声音,其拾音特性为一球形,指向性麦克风的拾音范围的二维平面图如图所示为360°圆形,因此全向性麦克在使用中除了获取用户在特定方向发出的人声信号外,还会接收到其他所有方向的干扰信号,且干扰信号与人声信号均以相等灵敏度的方式拾取进来,从而降低了人声信号的信噪比低,拾音质量较差。另一方面,语音信号的高频成分(大于其频段在4k以上)辐射方向具有较强的方向性,即与用户嘴部的朝向一致,全向性麦克风在使用状态下需要悬挂于耳部,从而导致其在拾音过程中会有高频信号的损失,音质损伤严重,且用户使用TWS耳机时环境变化多样,即环境噪声类型复杂,进一步降低了TWS耳机的语音采集效果。由此可见,现有的TWS耳机无法较好兼顾使用便捷性以及保证获取得到高品质采集语音信号。Illustratively, FIG. 6 shows a schematic structural diagram of an existing smart headset provided by this embodiment. As shown in Figure 6, the smart headset is specifically a true wireless headset. The true wireless headset is equipped with two microphone modules, and the two microphone modules are both omnidirectional microphone modules, which are the omnidirectional microphone 1 and the omnidirectional microphone respectively. Directional microphone 2. Compared with wired earphones, TWS earphones can effectively break free from wire constraints from the perspective of user use, and can improve the user experience when wearing them. In the existing smart device technology, in addition to providing audio output services for users, headsets often need to undertake audio collection services at the same time. However, because TWS headsets are in use, they need to be hung on the user’s ears to pick up the sound, and there are no other As a result, compared with wired earphones, the user cannot directly aim the microphone on the earphone to the mouth to pick up the sound, and the TWS earphone is worn far away from the user’s mouth, which makes the pickup on the TWS earphone. The sound quality is far inferior to wired earphones that pick up sound at close range. And as shown in the figure, the existing TWS headsets are often equipped with two omnidirectional microphones, and because the omnidirectional microphones can sense changes in sound pressure in all directions, they can receive sounds from all directions with equal sensitivity. The sound pickup characteristic is a spherical shape, and the two-dimensional plan view of the pickup range of the directional microphone is 360°circle as shown in the figure. Therefore, the omnidirectional microphone is used in addition to acquiring the human voice signal sent by the user in a specific direction. It will also receive interference signals in all other directions, and both the interference signal and the human voice signal are picked up with equal sensitivity, which reduces the low signal-to-noise ratio of the human voice signal and poor sound pickup quality. On the other hand, the high-frequency component of the voice signal (greater than its frequency band above 4k) has strong directivity, that is, it is consistent with the direction of the user’s mouth. The omnidirectional microphone needs to be hung on the ear when it is in use. As a result, high-frequency signals are lost during the sound pickup process, and the sound quality is severely damaged. Moreover, the environment changes when the user uses the TWS headset, that is, the type of environmental noise is complex, which further reduces the voice collection effect of the TWS headset. It can be seen that the existing TWS headsets cannot take into account the convenience of use and guarantee the acquisition of high-quality voice signals.
示例性地,图7示出了本实施例提供的一智能耳机的结构实体图。参见图7所示,该智能耳机具体为一真无线耳机,该真无线耳机上配置有两个麦克风模块,分别为一个全向性麦克风以及一个指向性麦克风。与图6提供的现有智能耳机相比,将耳机底部的麦克风由全向性麦克风替换为指向性麦克风。其中,指向性麦克风的拾音范围为一类心型区域;而全向性麦克风的拾音范围为一球形区域,如图7所示。由于智能耳机在使用状态下,相对于上端部而言,下端部与用户嘴部的距离更为接近,因此为了提高人声信号的拾音效果,需要使得指向性麦克风与用户嘴部的第一距离小于全向性麦克风与用户嘴部的第二距离,因此,将现有的智能耳机中下端部的全向性麦克风替换为指向性麦克风。优选地,在设计指向性麦克风的拾音方向时,可以根据耳机在使用状态下,指向性麦克风的安装位置与嘴部之间的相对位置方向,确定指向性麦克风的适应方向。具体地,可以指向性麦克风的安装位置为起点,用户嘴部为终点,生成一个方向向量,并根据预设的补偿角度以及所述方向向量,确定指向性麦克风的拾音方向。Exemplarily, FIG. 7 shows a structural entity diagram of a smart headset provided by this embodiment. As shown in FIG. 7, the smart headset is specifically a true wireless headset, and two microphone modules are configured on the true wireless headset, namely, an omnidirectional microphone and a directional microphone. Compared with the existing smart headset provided in Figure 6, the microphone at the bottom of the headset is replaced by an omnidirectional microphone with a directional microphone. Among them, the pickup range of a directional microphone is a cardioid area; and the pickup range of an omnidirectional microphone is a spherical area, as shown in Figure 7. Compared with the upper end of the smart headset, the distance between the lower end and the user’s mouth is closer when the smart headset is in use. Therefore, in order to improve the sound pickup effect of the human voice signal, it is necessary to make the directional microphone and the first part of the user’s mouth. The distance is less than the second distance between the omnidirectional microphone and the user's mouth. Therefore, the omnidirectional microphone at the lower end of the existing smart headset is replaced with a directional microphone. Preferably, when designing the sound pickup direction of the directional microphone, the adaptive direction of the directional microphone can be determined according to the relative position direction between the installation position of the directional microphone and the mouth when the earphone is in use. Specifically, the installation position of the directional microphone may be the starting point and the user's mouth as the end point, a direction vector is generated, and the pickup direction of the directional microphone is determined according to the preset compensation angle and the direction vector.
在一种可能的实现方式中,该指向性麦克风的麦克风口开孔朝下,从而可以有效地实现物理防风,减少了环境气流对收音造成的影响。由于智能耳机在使用状态下处于直立状态,在该状态下气流会冲击智能耳机的四周,若将麦克风口开孔在圆柱体的任一侧面,均会受到气流影响而拾取到气流震动过程中的气流噪声,从而降低了人声信号的信噪比,因此,可以将麦克风口的开口朝向嘲下,从而提高了人声信号的信噪比。需要说明的是,麦克风口的开口朝向与指向性麦克风的拾音方向不同,麦克风口,即麦克风的导声孔具体指物理上的收音位置,而指向性麦克风的拾音方向具体是在拾音效果而言,其心型拾音方位的突出部所在的方向。In a possible implementation manner, the microphone opening of the directional microphone faces downwards, which can effectively achieve physical wind protection and reduce the impact of ambient airflow on the sound reception. Since the smart headset is in an upright state in use, the airflow will impact the surroundings of the smart headset in this state. If the microphone port is opened on either side of the cylinder, it will be affected by the airflow and pick up the vibration during the airflow. The airflow noise reduces the signal-to-noise ratio of the human voice signal. Therefore, the opening of the microphone mouth can be directed downward, thereby improving the signal-to-noise ratio of the human voice signal. It should be noted that the opening direction of the microphone port is different from the pickup direction of the directional microphone. The microphone port, that is, the sound guide hole of the microphone, refers to the physical pickup position, while the pickup direction of the directional microphone is specifically the pickup direction. In terms of effect, the direction in which the protrusion of the cardioid pickup direction is located.
在S2012中,根据所述第一语音信号以及所述第二语音信号生成所述信号补偿曲线。In S2012, the signal compensation curve is generated according to the first voice signal and the second voice signal.
在本实施例中,智能耳机在获取得到通过全向性麦克风采集的第二语音信号后,可以将第二语音信号作为参考信号,并根据第一语音信号以及第二语音信号生成信号补偿曲线,从而保证了信号补偿曲线与当前采集得到的第一语音信号相匹配,实现了信号补偿曲线实时生成以及动态调整,提高了信号补偿曲线的准确性。In this embodiment, after the smart headset obtains the second voice signal collected by the omnidirectional microphone, it can use the second voice signal as a reference signal and generate a signal compensation curve based on the first voice signal and the second voice signal. This ensures that the signal compensation curve matches the first voice signal currently collected, realizes the real-time generation and dynamic adjustment of the signal compensation curve, and improves the accuracy of the signal compensation curve.
在一种可能的实现方式中,生成信号补偿曲线的方式可以为:智能耳机可以根据第二语音信号中各个频段的频响幅值关系,拟合为对应的实时频响函数。并根据第一语音信号中各个频段与上述实时频响函数之间的偏差值,得到在各个频段对应的调整比例,根据所有频段的调整比例生成上述的信号补偿曲线。In a possible implementation manner, the manner of generating the signal compensation curve may be: the smart headset may fit the corresponding real-time frequency response function according to the frequency response amplitude relationship of each frequency band in the second voice signal. And according to the deviation value between each frequency band in the first voice signal and the above-mentioned real-time frequency response function, the adjustment ratio corresponding to each frequency band is obtained, and the above-mentioned signal compensation curve is generated according to the adjustment ratio of all frequency bands.
在一种可能的实现方式中,生成信号补偿曲线的方式可以为:智能耳机内置有一基准补偿曲线以及神经学习模型,在每一次采集得到第一语音信号以及第二语音信号后,通过降维池化卷积的方式,生成第一语音信号对应的第一池化矩阵以及第二语音信号的第二池化矩阵,并提取上述两个池化矩阵对应的频响特征值,根据两个频响特征值之间的差值生成调整参量,将调整参量导入到上述的神经学习模型,对基准补偿曲线进行参数学习调整,将调整后的基准补偿曲线作为上述的信号补偿曲线,从而能够实现动态学习调整的目的,提高了信号补偿曲线的准确性。In a possible implementation, the signal compensation curve can be generated as follows: the smart headset has a built-in reference compensation curve and a neural learning model. After the first voice signal and the second voice signal are collected each time, the dimensionality reduction pool is used. The first pooling matrix corresponding to the first voice signal and the second pooling matrix of the second voice signal are generated in the way of transformation convolution, and the frequency response characteristic values corresponding to the two pooling matrices are extracted, according to the two frequency response The difference between the eigenvalues generates adjustment parameters, imports the adjustment parameters into the above-mentioned neural learning model, performs parameter learning adjustments on the reference compensation curve, and uses the adjusted reference compensation curve as the above-mentioned signal compensation curve, thereby realizing dynamic learning The purpose of adjustment is to improve the accuracy of the signal compensation curve.
在本申请实施例中,通过全向性麦克风同时采集与第一语音信号对应的第二语音信号,将第二语音信号作为参考信号生成信号补偿曲线,从而能够实时生成信号补偿曲线,提高了信号补偿曲线的准确性,优化指向性麦克风的拾音效果。In the embodiment of the present application, the second voice signal corresponding to the first voice signal is simultaneously collected by the omnidirectional microphone, and the second voice signal is used as the reference signal to generate the signal compensation curve, so that the signal compensation curve can be generated in real time, and the signal is improved. The accuracy of the compensation curve is optimized to optimize the pickup effect of the directional microphone.
图8示出了本申请第三实施例提供的一种语音信号的采集方法S2012的具体实现流程图。参见图8,相对于图5所述实施例,本实施例提供的一种语音信号的采集方法中S2012包括:S801~S804,具体详述如下:FIG. 8 shows a specific implementation flowchart of a voice signal collection method S2012 provided by the third embodiment of the present application. Referring to FIG. 8, compared with the embodiment described in FIG. 5, S2012 in a voice signal collection method provided in this embodiment includes: S801 to S804, which are detailed as follows:
进一步地,所述根据所述第一语音信号以及所述第二语音信号生成所述信号补偿曲线,包括:Further, the generating the signal compensation curve according to the first voice signal and the second voice signal includes:
在S801中,将所述第一语音信号转换为第一频域信号,以及将所述第二语音信号转换为第二频域信号。In S801, the first voice signal is converted into a first frequency domain signal, and the second voice signal is converted into a second frequency domain signal.
在本实施例中,第一语音信号和第二语音信号具体为时域信号,因此可以通过傅里叶变换算法,将第一语音信号以及第二语音信号从时域信号转换为频域信号,从而可以得到与第一语音信号对应的第一频域信号以及与第二语音信号对应的第二频响曲线。其中,上述频响曲线具体为在频率与增益之间的对应关系。In this embodiment, the first speech signal and the second speech signal are specifically time-domain signals. Therefore, the first speech signal and the second speech signal can be converted from the time-domain signal to the frequency-domain signal through the Fourier transform algorithm. Thus, the first frequency domain signal corresponding to the first voice signal and the second frequency response curve corresponding to the second voice signal can be obtained. Wherein, the aforementioned frequency response curve is specifically the corresponding relationship between frequency and gain.
在S802中,根据所述第一频域信号提取人声语音频段。In S802, the human voice speech frequency band is extracted according to the first frequency domain signal.
在本实施例中,由于指向性麦克风在采集人声信号时具有较好的采集效果,并且能够抑制多个方向的环境噪声,人声信号的信噪比较高,因此通过对第一语音信号对应的第一频域信号进行人声频谱的识别。智能耳机可以确定该第一频域信号内截取出人声范围的信号段,并根据信号段内各个频率值对应的增益,识别出人声频率,将所有识别得到的人声频率构成人声语音频段。In this embodiment, because the directional microphone has a better collection effect when collecting human voice signals, and can suppress environmental noise in multiple directions, the signal-to-noise ratio of the human voice signal is high, so by comparing the first voice signal The corresponding first frequency domain signal is used to identify the human voice spectrum. The smart headset can determine the signal segment of the human voice range in the first frequency domain signal, and recognize the human voice frequency according to the gain corresponding to each frequency value in the signal segment, and compose all the recognized human voice frequencies into the human voice speech Frequency band.
在一种可能的实现方式中,智能耳机可以设置有一有效增益阈值。智能耳机会判断在人体发声频段内,即上述截取的信号段中各个频率值的增益是否大于有效增益阈值,若是,则识别该频率值为人声频率;反之,若信号段内任一频率值的增益小于或等于有效增益阈值,则识别本次采集的人声信号中在该频率值不存在有效信息,识别为噪声频率。In a possible implementation, the smart headset may be set with an effective gain threshold. The smart headset will determine whether the gain of each frequency value in the human voice frequency band is greater than the effective gain threshold in the above intercepted signal segment. If it is, the frequency value is recognized as the human voice frequency; on the contrary, if any frequency value in the signal segment is If the gain is less than or equal to the effective gain threshold, it is recognized that there is no effective information at the frequency value in the human voice signal collected this time, and it is recognized as a noise frequency.
在S803中,过滤所述第二频域信号中除所述人声语音频段外的噪声频段,得到滤波信号。In S803, filter noise frequency bands other than the human voice speech frequency band in the second frequency domain signal to obtain a filtered signal.
在本实施例中,智能耳机以第一频域信号为基准,确定本次采集的语音信号内包 含的人声语音频段,但由于指向性麦克风在中高频段的增益有缺失,因此需要通过全向性麦克风进行增益补偿。而全向性麦克风在对于所有方向的采集灵敏度一致,因此在各个频段内均会获取到噪声,因此难以通过第二语音信号识别出人声频段,因此可以通过第一语音信号提取出人声语音频段,从而弥补第二语音信号中人声识别准确率低的缺陷。In this embodiment, the smart headset uses the first frequency domain signal as a reference to determine the human voice voice frequency band contained in the voice signal collected this time. However, due to the lack of gain of the directional microphone in the middle and high frequency bands, it needs to pass the whole The directional microphone performs gain compensation. The omnidirectional microphone has the same collection sensitivity in all directions, so noise will be acquired in each frequency band, so it is difficult to identify the human voice frequency band through the second voice signal, so the human voice voice can be extracted from the first voice signal Frequency band, so as to make up for the defect of low accuracy of human voice recognition in the second voice signal.
在本实施例中,智能耳机通过第一频响信号确定人声语音频段后,可以在第二频响信号中标记出人声语音频段中各个频率值,从而识别出第二频响信号中包含的人声频率,将除人声频率外的其他频率识别为噪声频率,并过滤掉噪声频率,从而生成消除了噪声信号后的滤波信号。该滤波信号只包含有人声频率,并记录有在各个频段该人声频率的增益值。In this embodiment, after the smart headset determines the human voice frequency band through the first frequency response signal, it can mark each frequency value in the human voice speech frequency band in the second frequency response signal, thereby identifying that the second frequency response signal contains The vocal frequency of the human voice is recognized as the noise frequency except for the human voice frequency, and the noise frequency is filtered out, thereby generating a filtered signal after the noise signal is eliminated. The filtered signal only contains the human voice frequency, and the gain value of the human voice frequency in each frequency band is recorded.
在S804中,通过所述滤波信号生成所述信号补偿曲线。In S804, the signal compensation curve is generated through the filtered signal.
在本实施例中,由于全向性麦克风不存在中高频的增益损失,因此可以根据第二频响信号生成的滤波信号,确定在各个不同频率值之间的相对比例,并基于相对比例确定在中高频段的增益补偿系数,从而生成信号补偿曲线,以消除指向性麦克风的先天采集失真的问题。In this embodiment, since the omnidirectional microphone does not have mid- and high-frequency gain loss, the filter signal generated by the second frequency response signal can be used to determine the relative proportions between different frequency values, and the relative proportions can be determined based on the relative proportions. The gain compensation coefficient in the middle and high frequency bands can generate a signal compensation curve to eliminate the problem of innate acquisition distortion of the directional microphone.
在本实施例中,通过信号补偿曲线对第一语音信号进行信号补偿,可以对高频的增益进行提升,从而可以补偿由于指向性麦克风的使用位置远离用户说话方向的轴向时而带来高频信号的声能衰减。举例性地,语音信号的高频辐射方向朝用户嘴部的前方所指示的方向,在拾音过程中会有高频信号的损失,因此信号补偿曲线的频响在2~7kHz这一频段增加8~10dB的增益补偿,提升峰点在可以5kHz处,通过信号补偿曲线进行增益补偿还能在一定程度上减少因脸部遮蔽所带来的高频损失,进一步提升了信号采集质量。In this embodiment, signal compensation is performed on the first voice signal through the signal compensation curve, which can increase the high frequency gain, so as to compensate for the high frequency caused by the use position of the directional microphone away from the axis of the user’s speaking direction. The sound energy of the signal is attenuated. For example, if the high-frequency radiation direction of the voice signal is in the direction indicated by the front of the user’s mouth, there will be a loss of high-frequency signals during the pickup process, so the frequency response of the signal compensation curve increases in the frequency band of 2-7kHz. The gain compensation of 8~10dB can increase the peak point at 5kHz. The gain compensation through the signal compensation curve can also reduce the high frequency loss caused by the masking of the face to a certain extent, and further improve the signal acquisition quality.
在本申请实施例中,通过对第一语音信号以及第二语音信号进行频域转换,分别根据指向性麦克风以及全向性麦克风之前采集的有缺陷,来生成信号补偿曲线,能够提高信号补偿曲线的准确性,。In the embodiment of the present application, by performing frequency domain conversion on the first voice signal and the second voice signal, the signal compensation curve is generated according to the previously collected defects of the directional microphone and the omnidirectional microphone, which can improve the signal compensation curve. Accuracy,.
图9示出了本申请第四实施例提供的一种语音信号的采集方法S202的具体实现流程图。参见图9,相对于图2所述实施例,本实施例提供的一种语音信号的采集方法中S202包括:S2021~S2023,具体详述如下:FIG. 9 shows a specific implementation flowchart of a voice signal collection method S202 provided by the fourth embodiment of the present application. Referring to FIG. 9, compared with the embodiment described in FIG. 2, S202 in a voice signal collection method provided in this embodiment includes: S2021 to S2023, and the details are as follows:
进一步地,所述根据信号补偿曲线对所述第一语音信号进行信号补偿,生成校准语音信号,包括:Further, the performing signal compensation on the first voice signal according to a signal compensation curve to generate a calibration voice signal includes:
在S2021中,生成所述第一语音信号的原始频响曲线。In S2021, an original frequency response curve of the first voice signal is generated.
在本实施例中,智能耳机可以通过傅里叶变换,将第一语音信号转换为频域信号,从而可以确定第一语音信号中各个频率值对应的增益值,从而得到原始频响曲线,该原始频响曲线用于表示频率值与增益之间的相互关系。In this embodiment, the smart headset can convert the first voice signal into a frequency domain signal through Fourier transform, so that the gain value corresponding to each frequency value in the first voice signal can be determined, and the original frequency response curve can be obtained. The original frequency response curve is used to express the correlation between the frequency value and the gain.
在S2022中,以所述信号补偿曲线为基准,对所述原始频响曲线内各个频率值进行拟合,得到频响校准曲线。In S2022, using the signal compensation curve as a reference, each frequency value in the original frequency response curve is fitted to obtain a frequency response calibration curve.
在本实施例中,信号补偿曲线可以对指向性麦克风在采集过程中中高频段的增益损失进行信号补偿,因此在生成了第一语音信号的原始频响曲线后,可以根据信号补偿曲线中各个频段的增益值的波形走势,对原始频响曲线内的各个频率进行调整,从 而生成频响校准曲线。In this embodiment, the signal compensation curve can compensate the gain loss of the directional microphone in the high frequency band during the acquisition process. Therefore, after the original frequency response curve of the first voice signal is generated, the signal compensation curve can be The waveform trend of the gain value of the frequency band adjusts each frequency in the original frequency response curve to generate a frequency response calibration curve.
在一种可能的实现方式中,智能耳机可以根据信号补偿曲线,确定各个频率值之间的增益比例,基于上述增益比例对原始频响曲线进行拟合,以得到频响校准曲线。例如,信号补偿曲线中,低频段与中频段之间的增益比例为80%,而原始频响曲线中,低频段与中频段之间的增益比例为120%,即中频段的增益比例缺失,可以增大中频段内各个频率值的增益,使得比例调整为80%,与信号补偿曲线中的增益比例一致。In a possible implementation manner, the smart headset may determine the gain ratio between the various frequency values according to the signal compensation curve, and fit the original frequency response curve based on the above gain ratio to obtain the frequency response calibration curve. For example, in the signal compensation curve, the gain ratio between the low frequency band and the mid frequency band is 80%, while in the original frequency response curve, the gain ratio between the low frequency band and the mid frequency band is 120%, that is, the gain ratio of the mid frequency band is missing. The gain of each frequency value in the mid-range can be increased, so that the ratio is adjusted to 80%, which is consistent with the gain ratio in the signal compensation curve.
在S2023中,基于所述频响基准曲线生成所述校准语音信号。In S2023, the calibration voice signal is generated based on the frequency response reference curve.
在本实施例中,对频响校准曲线进行逆傅里叶变换,将频域信号重新转换为时域信号,得到上述的校准语音信号。In this embodiment, an inverse Fourier transform is performed on the frequency response calibration curve, and the frequency domain signal is reconverted into a time domain signal to obtain the above-mentioned calibration voice signal.
在本申请实施例中,通过将第一语音信号转换为原始频响曲线,从而基于信号补偿曲线对原始频响曲线进行曲线拟合,从而补偿中高频段的频响损失,从而提高指向性麦克风的信号采集指令。In the embodiment of the present application, the first voice signal is converted into the original frequency response curve, so that the original frequency response curve is curve-fitted based on the signal compensation curve, thereby compensating for the frequency response loss of the mid-to-high frequency band, thereby improving the directional microphone Signal acquisition instructions.
图10示出了本申请第五实施例提供的一种语音信号的采集方法的具体实现流程图。参见图10,相对于图2、图5、图8以及图9任一所述实施例,本实施例提供的一种语音信号的采集方法在所述通过电子设备的指向性麦克风获取第一语音信号之前,还包括:S1001~S1003,具体详述如下:FIG. 10 shows a specific implementation flowchart of a voice signal collection method provided by the fifth embodiment of the present application. Referring to FIG. 10, with respect to any one of the embodiments described in FIG. 2, FIG. 5, FIG. 8 and FIG. 9, the method for collecting a voice signal provided by this embodiment is described in that the first voice is acquired through the directional microphone of the electronic device. Before the signal, it also includes: S1001~S1003, the details are as follows:
进一步地,在所述通过电子设备的指向性麦克风获取第一语音信号之前,还包括:Further, before the acquiring the first voice signal through the directional microphone of the electronic device, the method further includes:
在S1001中,获取当前的环境声强。In S1001, obtain the current environmental sound intensity.
在本实施例中,智能耳机在需要执行语音信号的采集操作之前,可以检测当前场景下的环境声强,判断当前的录音环境。若当前的环境声强较小,则表示当前属于较为安静的场景,环境噪声较少,可以执行S1002的操作;反之,若当前的环境声强较大,则可能表示当前属于较为吵杂的场景,环境噪声较大,环境噪声对于录音有较大的影响,此时可以执行S1003的操作。In this embodiment, before the smart headset needs to perform the voice signal collection operation, it can detect the ambient sound intensity in the current scene and determine the current recording environment. If the current environmental sound intensity is relatively small, it means that the current environment is relatively quiet, and the environmental noise is low, so you can perform the operation of S1002; on the contrary, if the current environmental sound intensity is high, it may indicate that the current environment is relatively noisy , The environmental noise is large, and the environmental noise has a large impact on the recording. At this time, you can perform the operation of S1003.
在本实施例中,该智能耳机配置有指向性麦克风以及全向性麦克风,两个不同麦克风可以对应不同场景的录音操作。In this embodiment, the smart headset is equipped with a directional microphone and an omnidirectional microphone, and two different microphones can correspond to recording operations in different scenes.
在S1002中,若所述环境声强小于预设的声强阈值,则通过全向性麦克风获取第三语音信号,输出所述第三语音信号。In S1002, if the environmental sound intensity is less than a preset sound intensity threshold, a third voice signal is acquired through an omnidirectional microphone, and the third voice signal is output.
在本实施例中,在检测到当前的环境声强小于预设的声强阈值时,则表示当前的环境噪声较少,此时可以通过智能耳机内置的全向性麦克风获取第三语音信号,由于全向性麦克风对于各个方向的拾音敏感度相同,即会获取各个方向反馈的噪声信号,而在环境声强较弱的情况下,即便采集全方向的噪声信号,对于人声信号的信噪比影响也较少,因此可以采用全向性麦克风采集人声信号,即上述的第三语音信号,并直接对第三语音信号进行信号输出。In this embodiment, when it is detected that the current environmental sound intensity is less than the preset sound intensity threshold, it means that the current environmental noise is less. At this time, the third voice signal can be obtained through the omnidirectional microphone built into the smart headset. Since the omnidirectional microphone has the same sensitivity to the sound pickup in all directions, it will obtain the feedback noise signal in all directions. In the case of weak environmental sound, even if the noise signal in all directions is collected, the signal of the human voice signal The noise ratio is also less affected, so an omnidirectional microphone can be used to collect the human voice signal, that is, the above-mentioned third voice signal, and directly output the third voice signal.
在S1003中,若所述环境声强大于或等于预设的声强阈值,则执行所述通过电子设备的指向性麦克风获取第一语音信号。In S1003, if the environmental sound is stronger than or equal to the preset sound intensity threshold, the first voice signal is acquired through the directional microphone of the electronic device.
在本实施例中,在检测到当前的环境声强大于或等于预设的声强阈值时,则表示当前的环境噪声较大,此时需要采用指向性麦克风获取人声信号,从而降低环境噪声对于人声信号的采集影响,从而提高人声信号的信噪比,则执行通过电子设备的指向性麦克风获取第一语音信号的操作,并通过后续的信号补偿手段,对第一语音信号进 行信号补偿,以减少指向性麦克风在中高频段的增益损失。In this embodiment, when it is detected that the current environmental sound is stronger than or equal to the preset sound intensity threshold, it means that the current environmental noise is relatively large. At this time, a directional microphone needs to be used to obtain human voice signals to reduce environmental noise. For the impact of human voice signal acquisition, thereby improving the signal-to-noise ratio of the human voice signal, the operation of acquiring the first voice signal through the directional microphone of the electronic device is performed, and the subsequent signal compensation means is used to signal the first voice signal Compensation to reduce the gain loss of the directional microphone in the middle and high frequency bands.
在本申请实施例中,在智能耳机上配置有全向性麦克风以及指向性麦克风,并在不同的场景下采用对应的麦克风采集语音信号,能够提高信号采集指令的同时,减少不必要的补偿操作,从而提高了语音信号的采集效率以及降低智能耳机的运算量。In the embodiment of the present application, the smart headset is equipped with an omnidirectional microphone and a directional microphone, and the corresponding microphone is used to collect voice signals in different scenarios, which can improve signal collection instructions while reducing unnecessary compensation operations. , Thereby improving the collection efficiency of voice signals and reducing the amount of calculation of the smart headset.
图11示出了本申请第六实施例提供的一种语音信号的采集方法S201的具体实现流程图。参见图11,相对于图2、图5、图8以及图9任一所述实施例,本实施例提供的一种语音信号的采集方法S201包括:S1101~S1103,具体详述如下:FIG. 11 shows a specific implementation flowchart of a voice signal collection method S201 provided by the sixth embodiment of the present application. Referring to FIG. 11, with respect to any one of the embodiments described in FIG. 2, FIG. 5, FIG. 8 and FIG. 9, a voice signal collection method S201 provided in this embodiment includes: S1101 to S1103, which are detailed as follows:
进一步地,所述通过电子设备的指向性麦克风获取第一语音信号,包括:Further, the acquiring the first voice signal through the directional microphone of the electronic device includes:
在S1101中,通过所述电子设备的传动部件控制所述指向性麦克风在多个预设角度采集训练语音信号。In S1101, the directional microphone is controlled by the transmission component of the electronic device to collect training voice signals at multiple preset angles.
在本实施例中,智能耳机的指向性麦克风的安装位置上配置有一传动部件,该传动部件可以控制麦克风在预设平面内以预设的步长进行旋转,实现在多个预设角度下采集语音信号。由于指向性麦克风具有一定的拾音方向,因此在指向性麦克风随着传动部件进行旋转的过程中,其拾音方向也会随之改变,在采集多端对话的过程中,可以通过调整指向性麦克风的拾音方向,从而语音信号的采集效果。基于此,智能耳机在采集语音信号之前,可以通过传动部件,将指向性麦克风旋转至多个预设角度,从而调整指向性麦克风的拾音方向,并获取各个预设角度下采集得到的训练语音信号。需要说明的是,智能耳机可以在多个预设角度下采集预设时长的训练语音信号。举例性地,该训练语音信号的采集时长可以为0.5s。In this embodiment, the installation position of the directional microphone of the smart headset is equipped with a transmission component, which can control the microphone to rotate in a preset plane with a preset step length, so as to realize collection at multiple preset angles. voice signal. Since the directional microphone has a certain pick-up direction, the pick-up direction of the directional microphone will also change as it rotates with the transmission components. In the process of collecting multi-terminal dialogues, you can adjust the directional microphone The direction of the sound pickup, and thus the collection effect of the voice signal. Based on this, before the smart headset collects voice signals, the directional microphone can be rotated to multiple preset angles through the transmission component, so as to adjust the pickup direction of the directional microphone, and obtain training voice signals collected at various preset angles . It should be noted that the smart headset can collect training voice signals of a preset duration at multiple preset angles. For example, the collection duration of the training voice signal may be 0.5s.
在S1102中,根据各个所述训练语音信号中人声语音频段的平均幅值,确定目标角度。In S1102, the target angle is determined according to the average amplitude of the human voice frequency band in each of the training speech signals.
在本实施例中,智能耳机将各个训练语音信号转换为频域信号,并根据人体发声频率范围,从各个训练频域信号中提取人声语音频段,并计算各个人声语音频段对应的平均幅值。选取平均幅值最大的预设角度作为目标角度。该角度下,人声语音频段的平均增幅较大,即人声信号的采集效果较好,具有较高的信噪比,因此可以将该预设角度作为目标角度,以在目标角度下录制用户的语音信号。In this embodiment, the smart headset converts each training voice signal into a frequency domain signal, and extracts the human voice frequency band from each training frequency domain signal according to the human vocal frequency range, and calculates the average amplitude corresponding to each human voice voice frequency band value. The preset angle with the largest average amplitude is selected as the target angle. In this angle, the average increase in the human voice frequency band is larger, that is, the human voice signal collection effect is better, and it has a higher signal-to-noise ratio. Therefore, the preset angle can be used as the target angle to record the user at the target angle Voice signal.
在S1103中,控制所述传动部件将所述指向性麦克风调整至所述目标角度,并在所述目标角度下获取所述第一语音信号。In S1103, the transmission component is controlled to adjust the directional microphone to the target angle, and the first voice signal is acquired at the target angle.
在本实施例中,智能耳机在确定了目标角度后,可以通过传动部件将指向性麦克风调整至目标角度,在目标角度下采集第一语音信号,特别在多端对话的场景中,用户的位置发生改变,通过传动部件确定目标角度,能够进一步提高人声信号的采集效果。In this embodiment, after the smart headset determines the target angle, the directional microphone can be adjusted to the target angle through the transmission component, and the first voice signal is collected at the target angle. Especially in the scene of multi-terminal dialogue, the position of the user occurs Change, determine the target angle through the transmission component, can further improve the collection effect of the human voice signal.
在本申请实施例中,通过在智能耳机内配置传动部件,对传动部件调整指向性麦克风的拾音方向,从而确定最佳的拾音角度,提高了拾音效果。In the embodiment of the present application, by configuring a transmission component in the smart headset, the transmission component is adjusted to the pickup direction of the directional microphone, thereby determining the best pickup angle and improving the pickup effect.
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that the size of the sequence number of each step in the foregoing embodiment does not mean the order of execution, and the execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation to the implementation process of the embodiment of the present application.
对应于上文实施例所述的语音信号的采集方法,图12示出了本申请实施例提供的语音信号的采集装置的结构框图,为了便于说明,仅示出了与本申请实施例相关的部 分。Corresponding to the voice signal collection method described in the above embodiment, FIG. 12 shows a structural block diagram of a voice signal collection device provided in an embodiment of the present application. For ease of description, only the information related to the embodiment of the present application is shown. section.
参照图12,该语音信号的采集装置包括:Referring to Figure 12, the voice signal collection device includes:
第一语音信号获取单元121,用于通过电子设备的指向性麦克风获取第一语音信号;The first voice signal acquiring unit 121 is configured to acquire the first voice signal through the directional microphone of the electronic device;
语音信号补偿单元122,用于根据信号补偿曲线对所述第一语音信号进行信号补偿,生成校准语音信号;The voice signal compensation unit 122 is configured to perform signal compensation on the first voice signal according to the signal compensation curve to generate a calibration voice signal;
校准语音信号输出单元123,用于输出所述校准语音信号。The calibration voice signal output unit 123 is used to output the calibration voice signal.
可选地,所述第一语音信号获取单元121包括:Optionally, the first voice signal acquiring unit 121 includes:
第二语音信号获取单元,用于在获取所述第一语音信号时,通过所述电子设备的全向性麦克风采集所述第一语音信号对应的第二语音信号;A second voice signal acquiring unit, configured to collect a second voice signal corresponding to the first voice signal through an omnidirectional microphone of the electronic device when acquiring the first voice signal;
信号补偿曲线生成单元,用于根据所述第一语音信号以及所述第二语音信号生成所述信号补偿曲线。The signal compensation curve generating unit is configured to generate the signal compensation curve according to the first voice signal and the second voice signal.
可选地,在所述电子设备采集语音信号时,所述指向性麦克风与人声声源之间的第一距离小于所述全向性麦克风与所述人声声源之间的第二距离。Optionally, when the electronic device collects the voice signal, the first distance between the directional microphone and the human sound source is smaller than the second distance between the omnidirectional microphone and the human sound source .
可选地,所述信号补偿曲线生成单元包括:Optionally, the signal compensation curve generating unit includes:
频域信号转换单元,用于将所述第一语音信号转换为第一频域信号,以及将所述第二语音信号转换为第二频域信号;A frequency domain signal conversion unit, configured to convert the first voice signal into a first frequency domain signal, and convert the second voice signal into a second frequency domain signal;
人声语音频段提取单元,用于根据所述第一频域信号提取人声语音频段;The human voice frequency band extraction unit is configured to extract the human voice frequency band according to the first frequency domain signal;
滤波信号生成单元,用于过滤所述第二频域信号中除所述人声语音频段外的噪声频段,得到滤波信号;A filtered signal generating unit, configured to filter noise frequency bands in the second frequency domain signal except for the human voice speech frequency band to obtain a filtered signal;
滤波信号转换单元,用于通过所述滤波信号生成所述信号补偿曲线。The filtered signal conversion unit is configured to generate the signal compensation curve through the filtered signal.
可选地,所述语音信号补偿单元122还包括:Optionally, the voice signal compensation unit 122 further includes:
原始频响曲线生成单元,用于生成所述第一语音信号的原始频响曲线;An original frequency response curve generating unit, configured to generate the original frequency response curve of the first voice signal;
频响校准曲线输出单元,用于以所述信号补偿曲线为基准,对所述原始频响曲线内各个频率值进行拟合,得到频响校准曲线;A frequency response calibration curve output unit, configured to fit each frequency value in the original frequency response curve with the signal compensation curve as a reference to obtain a frequency response calibration curve;
校准语音信号生成单元,用于基于所述频响基准曲线生成所述校准语音信号。The calibration voice signal generating unit is configured to generate the calibration voice signal based on the frequency response reference curve.
可选地,所述语音信号的采集装置还包括:Optionally, the voice signal collection device further includes:
环境声强获取单元,用于获取当前的环境声强;The ambient sound intensity acquisition unit is used to acquire the current ambient sound intensity;
第三语音信号采集单元,用于若所述环境声强小于预设的声强阈值,则通过全向性麦克风获取第三语音信号,输出所述第三语音信号;The third voice signal acquisition unit is configured to obtain a third voice signal through an omnidirectional microphone and output the third voice signal if the environmental sound intensity is less than a preset sound intensity threshold;
第一语音信号采集触发单元,用于若所述环境声强大于或等于预设的声强阈值,则执行所述通过电子设备的指向性麦克风获取第一语音信号。The first voice signal collection trigger unit is configured to perform the acquisition of the first voice signal through the directional microphone of the electronic device if the environmental sound is stronger than or equal to the preset sound intensity threshold.
可选地,所述第一语音信号获取单元121包括:Optionally, the first voice signal acquiring unit 121 includes:
训练语音信号采集单元,用于通过所述电子设备的传动部件控制所述指向性麦克风在多个预设角度采集训练语音信号;A training voice signal collection unit, configured to control the directional microphone to collect training voice signals at multiple preset angles through the transmission component of the electronic device;
目标角度识别单元,用于根据各个所述训练语音信号中人声语音频段的平均幅值,确定目标角度;The target angle recognition unit is configured to determine the target angle according to the average amplitude of the human voice frequency band in each of the training speech signals;
目标角度调整单元,用于控制所述传动部件将所述指向性麦克风调整至所述目标角度,并在所述目标角度下获取所述第一语音信号。The target angle adjustment unit is configured to control the transmission component to adjust the directional microphone to the target angle, and obtain the first voice signal at the target angle.
因此,本申请实施例提供的语音信号的采集装置同样可以通过指向性麦克风采集得到原始语音信号后,即第一语音信号,基于信号补偿曲线对第一语音信号进行信号补偿,得到校准语音信号,以提高高频段的语音信号的采集效果,降低声染的影响,并输出调整后的校准语音信号,能够在确保采集的语音信号中降低环境噪声基础上,提高语音采集的质量,从而增强了麦克风的信号输出效果。Therefore, the voice signal collection device provided in the embodiment of the present application can also collect the original voice signal, that is, the first voice signal, through the directional microphone, and perform signal compensation on the first voice signal based on the signal compensation curve to obtain the calibrated voice signal. In order to improve the collection effect of high-frequency voice signals, reduce the impact of sound staining, and output the adjusted calibrated voice signal, it can improve the quality of voice collection on the basis of ensuring that the collected voice signal reduces environmental noise, thereby enhancing the microphone The signal output effect.
图13为本申请一实施例提供的电子设备的结构示意图。如图13所示,该实施例的电子设备13包括:至少一个处理器130(图13中仅示出一个)处理器、存储器131以及存储在所述存储器131中并可在所述至少一个处理器130上运行的计算机程序132,所述处理器130执行所述计算机程序132时实现上述任意各个语音信号的采集方法实施例中的步骤。FIG. 13 is a schematic structural diagram of an electronic device provided by an embodiment of the application. As shown in FIG. 13, the electronic device 13 of this embodiment includes: at least one processor 130 (only one is shown in FIG. 13), a processor, a memory 131, and a processor stored in the memory 131 and capable of being processed in the at least one processor The computer program 132 running on the processor 130, when the processor 130 executes the computer program 132, implements the steps in any of the foregoing embodiments of the voice signal collection method.
所述电子设备13可以是桌上型计算机、笔记本、掌上电脑及云端电子设备等计算设备。该电子设备可包括,但不仅限于,处理器130、存储器131。本领域技术人员可以理解,图13仅仅是电子设备13的举例,并不构成对电子设备13的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如还可以包括输入输出设备、网络接入设备等。The electronic device 13 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud electronic device. The electronic device may include, but is not limited to, a processor 130 and a memory 131. Those skilled in the art can understand that FIG. 13 is only an example of the electronic device 13 and does not constitute a limitation on the electronic device 13. It may include more or less components than those shown in the figure, or a combination of certain components, or different components. , For example, can also include input and output devices, network access devices, and so on.
所称处理器130可以是中央处理单元(Central Processing Unit,CPU),该处理器130还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The so-called processor 130 may be a central processing unit (Central Processing Unit, CPU), and the processor 130 may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSPs), and application specific integrated circuits (Application Specific Integrated Circuits). , ASIC), ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
所述存储器131在一些实施例中可以是所述电子设备13的内部存储单元,例如电子设备13的硬盘或内存。所述存储器131在另一些实施例中也可以是所述语音信号的采集装置13的外部存储设备,例如所述电子设备13上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,所述存储器131还可以既包括所述电子设备13的内部存储单元也包括外部存储设备。所述存储器131用于存储操作系统、应用程序、引导装载程序(BootLoader)、数据以及其他程序等,例如所述计算机程序的程序代码等。所述存储器131还可以用于暂时地存储已经输出或者将要输出的数据。The memory 131 may be an internal storage unit of the electronic device 13 in some embodiments, such as a hard disk or a memory of the electronic device 13. In other embodiments, the memory 131 may also be an external storage device of the voice signal collection device 13, such as a plug-in hard disk or a smart memory card (Smart Media Card, SMC) equipped on the electronic device 13 , Secure Digital (SD) card, Flash Card, etc. Further, the memory 131 may also include both an internal storage unit of the electronic device 13 and an external storage device. The memory 131 is used to store an operating system, an application program, a boot loader (BootLoader), data, and other programs, such as the program code of the computer program. The memory 131 can also be used to temporarily store data that has been output or will be output.
需要说明的是,上述装置/单元之间的信息交互、执行过程等内容,由于与本申请方法实施例基于同一构思,其具体功能及带来的技术效果,具体可参见方法实施例部分,此处不再赘述。It should be noted that the information interaction and execution process between the above-mentioned devices/units are based on the same concept as the method embodiment of this application, and its specific functions and technical effects can be found in the method embodiment section for details. I won't repeat it here.
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中,上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。另外,各功能单元、模块的具体名称也只是为了便于相互区分,并不用于 限制本申请的保护范围。上述系统中单元、模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and conciseness of description, only the division of the above functional units and modules is used as an example. In practical applications, the above functions can be allocated to different functional units and modules as needed. Module completion, that is, the internal structure of the device is divided into different functional units or modules to complete all or part of the functions described above. The functional units and modules in the embodiments can be integrated into one processing unit, or each unit can exist alone physically, or two or more units can be integrated into one unit. The above-mentioned integrated units can be hardware-based Formal realization can also be realized in the form of a software functional unit. In addition, the specific names of the functional units and modules are only for the convenience of distinguishing each other, and are not used to limit the protection scope of this application. For the specific working process of the units and modules in the foregoing system, reference may be made to the corresponding process in the foregoing method embodiment, which will not be repeated here.
本申请实施例还提供了一种网络设备,该网络设备包括:至少一个处理器、存储器以及存储在所述存储器中并可在所述至少一个处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述任意各个方法实施例中的步骤。An embodiment of the present application also provides a network device, which includes: at least one processor, a memory, and a computer program stored in the memory and running on the at least one processor, and the processor executes The computer program implements the steps in any of the foregoing method embodiments.
本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现可实现上述各个方法实施例中的步骤。The embodiments of the present application also provide a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the steps in each of the foregoing method embodiments can be realized.
本申请实施例提供了一种计算机程序产品,当计算机程序产品在移动终端上运行时,使得移动终端执行时实现可实现上述各个方法实施例中的步骤。The embodiments of the present application provide a computer program product. When the computer program product runs on a mobile terminal, the steps in the foregoing method embodiments can be realized when the mobile terminal is executed.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实现上述实施例方法中的全部或部分流程,可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机程序包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质至少可以包括:能够将计算机程序代码携带到拍照装置/终端设备的任何实体或装置、记录介质、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质。例如U盘、移动硬盘、磁碟或者光盘等。在某些司法管辖区,根据立法和专利实践,计算机可读介质不可以是电载波信号和电信信号。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the implementation of all or part of the processes in the above-mentioned embodiment methods in the present application can be accomplished by instructing relevant hardware through a computer program. The computer program can be stored in a computer-readable storage medium. The computer program can be stored in a computer-readable storage medium. When executed by the processor, the steps of the foregoing method embodiments can be implemented. Wherein, the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file, or some intermediate forms. The computer-readable medium may at least include: any entity or device capable of carrying the computer program code to the photographing device/terminal device, recording medium, computer memory, read-only memory (ROM, Read-Only Memory), and random access memory (RAM, Random Access Memory), electric carrier signal, telecommunications signal and software distribution medium. For example, U disk, mobile hard disk, floppy disk or CD-ROM, etc. In some jurisdictions, according to legislation and patent practices, computer-readable media cannot be electrical carrier signals and telecommunication signals.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述或记载的部分,可以参见其它实施例的相关描述。In the above-mentioned embodiments, the description of each embodiment has its own focus. For parts that are not described in detail or recorded in an embodiment, reference may be made to related descriptions of other embodiments.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。A person of ordinary skill in the art may realize that the units and algorithm steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.
在本申请所提供的实施例中,应该理解到,所揭露的装置/网络设备和方法,可以通过其它的方式实现。例如,以上所描述的装置/网络设备实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通讯连接可以是通过一些接口,装置或单元的间接耦合或通讯连接,可以是电性,机械或其它的形式。In the embodiments provided in this application, it should be understood that the disclosed apparatus/network equipment and method may be implemented in other ways. For example, the device/network device embodiments described above are only illustrative. For example, the division of the modules or units is only a logical function division, and there may be other divisions in actual implementation, such as multiple units. Or components can be combined or integrated into another system, or some features can be omitted or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that they can still implement the foregoing The technical solutions recorded in the examples are modified, or some of the technical features are equivalently replaced; these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the application, and should be included in Within the scope of protection of this application.
最后应说明的是:以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何在本申请揭露的技术范围内的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。Finally, it should be noted that the above are only specific implementations of this application, but the scope of protection of this application is not limited to this. Any changes or substitutions within the technical scope disclosed in this application shall be covered by this application. Within the scope of protection applied for. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims (10)

  1. 一种语音信号的采集方法,其特征在于,包括:A method for collecting voice signals, which is characterized in that it includes:
    通过电子设备的指向性麦克风获取第一语音信号;Acquiring the first voice signal through the directional microphone of the electronic device;
    根据信号补偿曲线对所述第一语音信号进行信号补偿,生成校准语音信号;Performing signal compensation on the first voice signal according to the signal compensation curve to generate a calibration voice signal;
    输出所述校准语音信号。The calibration voice signal is output.
  2. 根据权利要求1所述的采集方法,其特征在于,所述通过电子设备的指向性麦克风获取第一语音信号,包括:The collection method according to claim 1, wherein the acquiring the first voice signal through the directional microphone of the electronic device comprises:
    在获取所述第一语音信号时,通过所述电子设备的全向性麦克风采集所述第一语音信号对应的第二语音信号;When acquiring the first voice signal, collecting a second voice signal corresponding to the first voice signal through an omnidirectional microphone of the electronic device;
    根据所述第一语音信号以及所述第二语音信号生成所述信号补偿曲线。The signal compensation curve is generated according to the first voice signal and the second voice signal.
  3. 根据权利要求2所述的采集方法,其特征在于,在所述电子设备采集语音信号时,所述指向性麦克风与人声声源之间的第一距离小于所述全向性麦克风与所述人声声源之间的第二距离。The collection method according to claim 2, wherein when the electronic device collects voice signals, the first distance between the directional microphone and the human sound source is smaller than the distance between the omnidirectional microphone and the omnidirectional microphone. The second distance between human sound sources.
  4. 根据权利要求2所述的采集方法,其特征在于,所述根据所述第一语音信号以及所述第二语音信号生成所述信号补偿曲线,包括:The collection method according to claim 2, wherein the generating the signal compensation curve according to the first voice signal and the second voice signal comprises:
    将所述第一语音信号转换为第一频域信号,以及将所述第二语音信号转换为第二频域信号;Converting the first voice signal into a first frequency domain signal, and converting the second voice signal into a second frequency domain signal;
    根据所述第一频域信号提取人声语音频段;Extracting a human voice speech frequency band according to the first frequency domain signal;
    过滤所述第二频域信号中除所述人声语音频段外的噪声频段,得到滤波信号;Filtering noise frequency bands in the second frequency domain signal other than the human voice speech frequency band to obtain a filtered signal;
    通过所述滤波信号生成所述信号补偿曲线。The signal compensation curve is generated by the filtered signal.
  5. 根据权利要求1所述的采集方法,其特征在于,所述根据信号补偿曲线对所述第一语音信号进行信号补偿,生成校准语音信号,包括:The collection method according to claim 1, wherein the performing signal compensation on the first voice signal according to a signal compensation curve to generate a calibration voice signal comprises:
    生成所述第一语音信号的原始频响曲线;Generating an original frequency response curve of the first voice signal;
    以所述信号补偿曲线为基准,对所述原始频响曲线内各个频率值进行拟合,得到频响校准曲线;Using the signal compensation curve as a reference, fitting each frequency value in the original frequency response curve to obtain a frequency response calibration curve;
    基于所述频响基准曲线生成所述校准语音信号。The calibration voice signal is generated based on the frequency response reference curve.
  6. 根据权利要求1-5任一项所述的采集方法,其特征在于,在所述通过电子设备的指向性麦克风获取第一语音信号之前,还包括:The collection method according to any one of claims 1 to 5, wherein before the first voice signal is acquired through the directional microphone of the electronic device, the method further comprises:
    获取当前的环境声强;Get the current environmental sound intensity;
    若所述环境声强小于预设的声强阈值,则通过全向性麦克风获取第三语音信号,输出所述第三语音信号;If the environmental sound intensity is less than the preset sound intensity threshold, acquiring a third voice signal through an omnidirectional microphone, and outputting the third voice signal;
    若所述环境声强大于或等于预设的声强阈值,则执行所述通过电子设备的指向性麦克风获取第一语音信号。If the ambient sound is stronger than or equal to the preset sound intensity threshold, performing the acquisition of the first voice signal through the directional microphone of the electronic device.
  7. 根据权利要求1-5任一项所述的采集方法,其特征在于,所述通过电子设备的指向性麦克风获取第一语音信号,包括:The collection method according to any one of claims 1 to 5, wherein the acquiring the first voice signal through the directional microphone of the electronic device comprises:
    通过所述电子设备的传动部件控制所述指向性麦克风在多个预设角度采集训练语音信号;Controlling the directional microphone to collect training voice signals at multiple preset angles through the transmission component of the electronic device;
    根据各个所述训练语音信号中人声语音频段的平均幅值,确定目标角度;Determine the target angle according to the average amplitude of the human voice frequency band in each of the training speech signals;
    控制所述传动部件将所述指向性麦克风调整至所述目标角度,并在所述目标角度下获取所述第一语音信号。The transmission component is controlled to adjust the directional microphone to the target angle, and the first voice signal is acquired at the target angle.
  8. 一种语音信号的采集装置,其特征在于,包括:A voice signal collection device, which is characterized in that it comprises:
    第一语音信号获取单元,用于通过电子设备的指向性麦克风获取第一语音信号;The first voice signal acquiring unit is configured to acquire the first voice signal through the directional microphone of the electronic device;
    语音信号补偿单元,用于根据信号补偿曲线对所述第一语音信号进行信号补偿, 生成校准语音信号;A voice signal compensation unit, configured to perform signal compensation on the first voice signal according to a signal compensation curve to generate a calibration voice signal;
    校准语音信号输出单元,用于输出所述校准语音信号。The calibration voice signal output unit is used to output the calibration voice signal.
  9. 一种电子设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如权利要求1至7任一项所述的方法。An electronic device, comprising a memory, a processor, and a computer program stored in the memory and capable of running on the processor, wherein the processor executes the computer program as claimed in claims 1 to 7. The method of any one.
  10. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至7任一项所述的方法。A computer-readable storage medium storing a computer program, wherein the computer program implements the method according to any one of claims 1 to 7 when the computer program is executed by a processor.
PCT/CN2020/125590 2019-12-12 2020-10-30 Voice signal acquisition method and apparatus, electronic device, and storage medium WO2021114953A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911275035.5 2019-12-12
CN201911275035.5A CN112992169A (en) 2019-12-12 2019-12-12 Voice signal acquisition method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2021114953A1 true WO2021114953A1 (en) 2021-06-17

Family

ID=76329560

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/125590 WO2021114953A1 (en) 2019-12-12 2020-10-30 Voice signal acquisition method and apparatus, electronic device, and storage medium

Country Status (2)

Country Link
CN (1) CN112992169A (en)
WO (1) WO2021114953A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114071324A (en) * 2021-12-21 2022-02-18 Oppo广东移动通信有限公司 Audio processing method and device and electronic equipment

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113542983B (en) * 2021-07-09 2023-06-27 安徽聆思智能科技有限公司 Audio signal processing method, device, equipment and storage medium
CN113596669B (en) * 2021-08-25 2022-10-28 歌尔科技有限公司 Voice compensation method, device and related components
CN114257684A (en) * 2021-12-17 2022-03-29 歌尔科技有限公司 Voice processing method, system and device and electronic equipment
CN116668892B (en) * 2022-11-14 2024-04-12 荣耀终端有限公司 Audio signal processing method, electronic device and readable storage medium
CN117539817B (en) * 2024-01-09 2024-04-05 上海韬润半导体有限公司 Serial signal transmission adjusting circuit, device and adjusting method

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030061032A1 (en) * 2001-09-24 2003-03-27 Clarity, Llc Selective sound enhancement
CN101203063A (en) * 2007-12-19 2008-06-18 北京中星微电子有限公司 Method and apparatus for noise elimination of microphone array
US20090022335A1 (en) * 2007-07-19 2009-01-22 Alon Konchitsky Dual Adaptive Structure for Speech Enhancement
CN103000184A (en) * 2011-09-15 2013-03-27 Jvc建伍株式会社 Noise reduction apparatus, audio input apparatus, wireless communication apparatus, and noise reduction method
CN106034272A (en) * 2015-03-17 2016-10-19 钰太芯微电子科技(上海)有限公司 Loudspeaker compensation system and portable mobile terminal
CN106409306A (en) * 2016-09-19 2017-02-15 宁波高新区敦和科技有限公司 Intelligent system obtaining human voice and obtaining method based on the system
CN107274907A (en) * 2017-07-03 2017-10-20 北京小鱼在家科技有限公司 The method and apparatus that directive property pickup is realized in dual microphone equipment
CN110197671A (en) * 2019-06-17 2019-09-03 深圳壹秘科技有限公司 Orient sound pick-up method, sound pick-up outfit and storage medium
CN110234043A (en) * 2019-05-31 2019-09-13 歌尔科技有限公司 Audio signal processing method, device and equipment based on microphone array

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5524056A (en) * 1993-04-13 1996-06-04 Etymotic Research, Inc. Hearing aid having plural microphones and a microphone switching system
CN101807404B (en) * 2010-03-04 2012-02-08 清华大学 Pretreatment system for strengthening directional voice at front end of electronic cochlear implant
CN108449691B (en) * 2018-05-04 2021-05-04 科大讯飞股份有限公司 Pickup device and sound source distance determining method

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030061032A1 (en) * 2001-09-24 2003-03-27 Clarity, Llc Selective sound enhancement
US20090022335A1 (en) * 2007-07-19 2009-01-22 Alon Konchitsky Dual Adaptive Structure for Speech Enhancement
CN101203063A (en) * 2007-12-19 2008-06-18 北京中星微电子有限公司 Method and apparatus for noise elimination of microphone array
CN103000184A (en) * 2011-09-15 2013-03-27 Jvc建伍株式会社 Noise reduction apparatus, audio input apparatus, wireless communication apparatus, and noise reduction method
CN106034272A (en) * 2015-03-17 2016-10-19 钰太芯微电子科技(上海)有限公司 Loudspeaker compensation system and portable mobile terminal
CN106409306A (en) * 2016-09-19 2017-02-15 宁波高新区敦和科技有限公司 Intelligent system obtaining human voice and obtaining method based on the system
CN107274907A (en) * 2017-07-03 2017-10-20 北京小鱼在家科技有限公司 The method and apparatus that directive property pickup is realized in dual microphone equipment
CN110234043A (en) * 2019-05-31 2019-09-13 歌尔科技有限公司 Audio signal processing method, device and equipment based on microphone array
CN110197671A (en) * 2019-06-17 2019-09-03 深圳壹秘科技有限公司 Orient sound pick-up method, sound pick-up outfit and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114071324A (en) * 2021-12-21 2022-02-18 Oppo广东移动通信有限公司 Audio processing method and device and electronic equipment

Also Published As

Publication number Publication date
CN112992169A (en) 2021-06-18

Similar Documents

Publication Publication Date Title
WO2021114953A1 (en) Voice signal acquisition method and apparatus, electronic device, and storage medium
CN105280195B (en) The processing method and processing device of voice signal
CN110970057B (en) Sound processing method, device and equipment
KR102565882B1 (en) the Sound Outputting Device including a plurality of microphones and the Method for processing sound signal using the plurality of microphones
CN108540900B (en) Volume adjusting method and related product
WO2014101429A1 (en) Noise reduction method and device for bi-microphone of terminal
CN112532266A (en) Intelligent helmet and voice interaction control method of intelligent helmet
CN113744750B (en) Audio processing method and electronic equipment
WO2020015301A1 (en) Noise reducing earphone
WO2023284402A1 (en) Audio signal processing method, system, and apparatus, electronic device, and storage medium
US20230396940A1 (en) Hearing aid having a sensor
CN113949955B (en) Noise reduction processing method and device, electronic equipment, earphone and storage medium
CN113949956B (en) Noise reduction processing method and device, electronic equipment, earphone and storage medium
WO2022199405A1 (en) Voice control method and apparatus
TW202021378A (en) Controlling headset method and headset
CN113207056B (en) Wireless earphone and transparent transmission method, device and system thereof
CN114333886A (en) Audio processing method and device, electronic equipment and storage medium
CN109754796A (en) The method and electronic device of function are executed using multiple microphones
CN113038318B (en) Voice signal processing method and device
WO2022233308A1 (en) Wearing detection method, wearable device and storage medium
US11562763B2 (en) Method for improving sound quality and electronic device using same
CN111182416B (en) Processing method and device and electronic equipment
CN113824838A (en) Sound production control method and device, electronic equipment and storage medium
US11393449B1 (en) Methods and apparatus for obtaining biometric data
WO2023109893A1 (en) Earphone control method, related system and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20898260

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20898260

Country of ref document: EP

Kind code of ref document: A1