WO2021114953A1

WO2021114953A1 - Voice signal acquisition method and apparatus, electronic device, and storage medium

Info

Publication number: WO2021114953A1
Application number: PCT/CN2020/125590
Authority: WO
Inventors: 尹明婕; 缪海波; 陶凯; 张时
Original assignee: 华为技术有限公司
Priority date: 2019-12-12
Filing date: 2020-10-30
Publication date: 2021-06-17
Also published as: CN112992169A

Abstract

A voice signal acquisition method and apparatus, an electronic device, and a storage medium. The method comprises: obtaining a first voice signal by means of a directional microphone of an electronic device (S201); performing signal compensation on the first voice signal according to a signal compensation curve to generate a calibration voice signal (S202); and outputting the calibration voice signal (S203). According to the solution, an original voice signal is obtained by acquisition by a directional microphone, i.e., a first voice signal, and then the first voice signal is performed signal compensation on the basis of a signal compensation curve to obtain a calibration voice signal to improve the acquisition effect of a voice signal at a high audio segment and reduce the influence of noise pollution, and the adjusted calibration voice signal is output, which can improve the quality of voice acquisition while ensuring that the ambient noise in the acquired voice signal is reduced, thereby improving the signal output effect of the microphone.

Description

Voice signal collection method, device, electronic equipment and storage medium

This application claims the priority of a Chinese patent application submitted to the State Intellectual Property Office on December 12, 2019, the application number is 201911275035.5, and the application name is "Voice signal collection methods, devices, electronic equipment, and storage media", and its entire contents Incorporated in this application by reference.

Technical field

This application belongs to the technical field of signal processing, and in particular relates to methods, devices, electronic equipment, and storage media for collecting voice signals.

Background technique

With the development of portable terminal consumer electronic products, a large number of miniature condenser microphones have been widely used. In the prior art, in order to pick up a sound signal in a certain direction, a directional microphone can be used for voice collection, so as to improve the collection quality of the voice signal. However, in the existing voice signal collection technology, although the directional microphone can have better voice signal collection ability in a specific direction, the collection effect of the high-frequency range of the directional microphone is weak, and there are sound stains and frequency response curves. Insufficient smoothness and other defects. When using a directional microphone, a high-quality voice signal still cannot be obtained, which affects the signal output effect of the microphone.

Summary of the invention

The embodiments of the present application provide a voice signal collection method, device, wearable device, and storage medium, which can improve the quality of voice collection and enhance the signal output effect of the microphone.

In the first aspect, an embodiment of the present application provides a voice signal collection method, including:

Acquiring the first voice signal through the directional microphone of the electronic device;

Performing signal compensation on the first voice signal according to the signal compensation curve to generate a calibration voice signal;

The calibration voice signal is output.

Exemplarily, the sound pickup direction of the directional microphone is determined according to the mutual positional relationship between the smart earphone and the user's mouth in the use state.

In a possible implementation manner of the first aspect, the acquiring the first voice signal through the directional microphone of the electronic device includes:

When acquiring the first voice signal, collecting a second voice signal corresponding to the first voice signal through an omnidirectional microphone of the electronic device;

The signal compensation curve is generated according to the first voice signal and the second voice signal.

In a possible implementation of the first aspect, when the electronic device collects voice signals, the first distance between the directional microphone and the human sound source is smaller than the omnidirectional microphone and the human sound source. The second distance between sound sources.

In a possible implementation manner of the first aspect, the generating the signal compensation curve according to the first voice signal and the second voice signal includes:

Converting the first voice signal into a first frequency domain signal, and converting the second voice signal into a second frequency domain signal;

Extracting a human voice speech frequency band according to the first frequency domain signal;

Filtering noise frequency bands in the second frequency domain signal other than the human voice speech frequency band to obtain a filtered signal;

The signal compensation curve is generated by the filtered signal.

In a possible implementation manner of the first aspect, the performing signal compensation on the first voice signal according to a signal compensation curve to generate a calibration voice signal includes:

Generating an original frequency response curve of the first voice signal;

Using the signal compensation curve as a reference, fitting each frequency value in the original frequency response curve to obtain a frequency response calibration curve;

The calibration voice signal is generated based on the frequency response reference curve.

Exemplarily, the gain ratio between each frequency value is determined according to the signal compensation curve, and the original frequency response curve is fitted based on the above gain ratio to obtain the frequency response calibration curve.

In a possible implementation manner of the first aspect, before the acquiring the first voice signal through the directional microphone of the electronic device, the method further includes:

Get the current environmental sound intensity;

If the environmental sound intensity is less than the preset sound intensity threshold, acquiring a third voice signal through an omnidirectional microphone, and outputting the third voice signal;

If the ambient sound is stronger than or equal to the preset sound intensity threshold, performing the acquisition of the first voice signal through the directional microphone of the electronic device.

Controlling the directional microphone to collect training voice signals at multiple preset angles through the transmission component of the electronic device;

Determine the target angle according to the average amplitude of the human voice frequency band in each of the training speech signals;

The transmission component is controlled to adjust the directional microphone to the target angle, and the first voice signal is acquired at the target angle.

In the second aspect, an embodiment of the present application provides a voice signal collection device, including:

The first voice signal acquiring unit is configured to acquire the first voice signal through the directional microphone of the electronic device;

A voice signal compensation unit, configured to perform signal compensation on the first voice signal according to a signal compensation curve to generate a calibrated voice signal;

The calibration voice signal output unit is used to output the calibration voice signal.

In a third aspect, an embodiment of the present application provides an electronic device, a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor executes the The computer program implements the voice signal collection method described in any one of the above-mentioned first aspects.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium that stores a computer program, and is characterized in that, when the computer program is executed by a processor, any of the above-mentioned aspects of the first aspect is implemented. A method for collecting the voice signal.

In the fifth aspect, the embodiments of the present application provide a computer program product, which when the computer program product runs on a terminal device, causes the terminal device to execute the voice signal collection method described in any one of the above-mentioned first aspects.

It can be understood that, for the beneficial effects of the second aspect to the fifth aspect described above, reference may be made to the relevant description in the first aspect described above, which will not be repeated here.

Compared with the prior art, the embodiments of this application have the following beneficial effects:

In the embodiment of the application, after the original voice signal is collected by the directional microphone, that is, the first voice signal, the signal compensation is performed on the first voice signal based on the signal compensation curve to obtain the calibrated voice signal, so as to improve the collection effect of the high-frequency voice signal , Reduce the impact of sound staining, and output the adjusted calibrated voice signal, which can improve the quality of voice collection on the basis of ensuring that the environmental noise in the collected voice signal is reduced, thereby enhancing the signal output effect of the microphone.

Description of the drawings

FIG. 1 is a block diagram of a part of the structure of a smart headset provided by an embodiment of the present application;

FIG. 2 is an implementation flowchart of a voice signal collection method provided by the first embodiment of the present application;

Fig. 3 is a sound pickup area of a directional microphone provided by an embodiment of the present application;

FIG. 4 is a schematic diagram of sound pickup in the use state of the smart headset provided by the first embodiment of the present application;

5 is a specific implementation flowchart of a voice signal collection method S201 provided by the second embodiment of the present application;

FIG. 6 is a schematic structural diagram of an existing smart headset provided by an embodiment of the present application;

FIG. 7 is a structural entity diagram of a smart headset provided by an embodiment of the present application;

FIG. 8 is a specific implementation flowchart of a voice signal collection method S2012 provided by the third embodiment of the present application;

9 is a specific implementation flowchart of a voice signal collection method S202 provided by the fourth embodiment of the present application;

10 is a specific implementation flowchart of a voice signal collection method provided by the fifth embodiment of the present application;

11 is a specific implementation flowchart of a voice signal collection method S201 provided by the sixth embodiment of the present application;

FIG. 12 is a structural block diagram of a voice signal collection device provided by an embodiment of the present application;

FIG. 13 is a schematic diagram of an electronic device provided by another embodiment of the present application.

Detailed ways

In the following description, for the purpose of illustration rather than limitation, specific details such as a specific system structure and technology are proposed for a thorough understanding of the embodiments of the present application. However, it should be clear to those skilled in the art that the present application can also be implemented in other embodiments without these specific details. In other cases, detailed descriptions of well-known systems, devices, circuits, and methods are omitted to avoid unnecessary details from obstructing the description of this application.

It should be understood that when used in the specification and appended claims of this application, the term "comprising" indicates the existence of the described features, wholes, steps, operations, elements and/or components, but does not exclude one or more other The existence or addition of features, wholes, steps, operations, elements, components, and/or collections thereof.

It should also be understood that the term "and/or" used in the specification and appended claims of this application refers to any combination of one or more of the items listed in the associated and all possible combinations, and includes these combinations.

As used in the description of this application and the appended claims, the term "if" can be construed as "when" or "once" or "in response to determination" or "in response to detecting ". Similarly, the phrase "if determined" or "if detected [described condition or event]" can be interpreted as meaning "once determined" or "in response to determination" or "once detected [described condition or event]" depending on the context ]" or "in response to detection of [condition or event described]".

In addition, in the description of the specification of this application and the appended claims, the terms "first", "second", "third", etc. are only used to distinguish the description, and cannot be understood as indicating or implying relative importance.

The reference to "one embodiment" or "some embodiments" described in the specification of this application means that one or more embodiments of this application include a specific feature, structure, or characteristic described in combination with the embodiment. Therefore, the sentences "in one embodiment", "in some embodiments", "in some other embodiments", "in some other embodiments", etc. appearing in different places in this specification are not necessarily All refer to the same embodiment, but mean "one or more but not all embodiments" unless it is specifically emphasized otherwise. The terms "including", "including", "having" and their variations all mean "including but not limited to", unless otherwise specifically emphasized.

The voice signal collection method provided in the embodiments of the present application can be applied to electronic devices. The electronic device can be any device with a voice collection function, including but not limited to mobile phones, tablets, wearable devices, in-vehicle devices, augmented reality (AR)/virtual reality (VR) with voice collection functions. ) Devices, notebook computers, ultra-mobile personal computers (UMPC), netbooks, personal digital assistants (personal digital assistants, PDAs), etc., can also be applied to databases, servers, and systems that can be controlled by voice. The application embodiments do not impose any restrictions on the specific types of electronic devices. The voice signal collection method provided in this application can be specifically stored in an electronic device in the form of an application program or software, and the electronic device implements the voice signal collection method provided in this application by executing the application program or software.

As an example and not a limitation, the electronic device may also be a wearable device. The wearable device may be a general term for using wearable technology to intelligently design daily wear and develop wearable devices, such as wired earphones and wireless earphones. , Earphones, glasses, gloves, watches, clothing and shoes with microphone function. A wearable device is a portable device that is directly worn on the body or integrated into the user's clothes or accessories, and is attached to the user's body to collect the user's voice signal. Wearable devices are not only a kind of hardware device, but also realize powerful functions through software support, data interaction, and cloud interaction. In a broad sense, wearable smart devices include full-featured, large-sized, complete or partial functions that can be realized without relying on smart phones, such as smart headphones or smart microphones, and only focus on a certain type of application function, and need to be used in conjunction with other devices such as smart phones. , Such as all kinds of smart bracelets and smart jewelry that can collect user voice signals.

Take, for example, that the electronic device is a smart headset. FIG. 1 shows a block diagram of a part of the structure of a smart headset provided by an embodiment of the present application. 1, the smart headset includes: a near field communication module 110, a memory 120, an input unit 130, a sensor 140, an audio circuit 150, a processor 160, a power supply 170 and other components. Those skilled in the art can understand that the structure of the smart headset shown in FIG. 1 does not constitute a limitation on the smart headset, and may include more or less components than those shown in the figure, or combine some components, or arrange different components.

The following describes the components of the smart headset in detail with reference to Figure 1:

The smart headset can receive the communication connection with the associated user terminal through the near field communication module 110, for example, establish communication with the user’s mobile phone, receive the audio signal sent by the user’s mobile phone through the near field communication module 110, and pass the speaker in the audio circuit 150 151 plays the audio signal, and can send the collected voice signal to the associated user terminal through the near field communication module 110. For example, when the user uses the user terminal to make a voice call, the voice signal can be collected through a smart headset, At this time, the smart headset can establish a real-time communication link with the user terminal through the near field communication module 110, collect the user’s voice signal through the microphone 152 in the audio circuit 150, and feed the voice signal back to the user terminal through the real-time communication link. And send to the correspondent end. For example, the near field communication module 110 may be a Bluetooth communication module, a Bluetooth low energy communication module, or a WiFi communication module.

The memory 120 may be used to store software programs and modules. The processor 160 executes various functional applications and data processing of the smart headset by running the software programs and modules stored in the memory 120, for example, the received voice signals and pre-configured signals The compensation curve and the like are stored in the memory 120. The memory 120 may mainly include a storage program area and a storage data area. The storage program area may store an operating system, an application program required by at least one function (such as a sound playback function, a sound collection function, etc.), etc.; the storage data area may store data according to Data (such as audio data, etc.) created by the use of smart headphones, etc. In addition, the memory 120 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices.

The input unit 130 may be used to receive input digital or character information, and generate key signal input related to user settings and function control of the smart headset 100. Specifically, the input unit 130 may include a control button 131 and other input devices 132. For example, the control button 131 may be a so-called contact button. When the user touches the corresponding area, the key signal input corresponding to the position can be generated, without the user pressing, and the outer surface of the smart headset can be integrated. The purpose of this touch button can collect the user's touch operation on or near it (for example, the user uses a finger or accessory to operate on or near the control button 131), and drive the corresponding according to the preset program Connect the device. Specifically, the user may generate a collection instruction for controlling the voice signal of the terminal through the input unit 130, and send the voice signal to the user terminal through the near field communication module 110.

The smart headset 100 may also include at least one sensor 140, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor and a proximity sensor. The ambient light sensor can determine whether there is a touch operation according to the brightness of the ambient light. The proximity sensor can be greater than the distance between the two ear parts of the smart headset. When the preset distance threshold is set, a pause command for sound playback or a pause command for voice signal collection is automatically generated. As a kind of motion sensor, the accelerometer sensor can detect the magnitude of acceleration in various directions (usually three-axis), and can detect the magnitude and direction of gravity when it is stationary. It can be used to recognize the application of smart headset gestures and vibration recognition related functions ( Tap), etc.; as for the infrared sensor and other sensors that can be configured in the smart headset, I will not repeat them here.

The audio circuit 150, the speaker 151, and the microphone 152 can provide an audio interface between the user and the smart headset. The audio circuit 150 can transmit the electric signal converted from the received audio data to the speaker 151, and the speaker 151 converts it into a sound signal for output; on the other hand, the microphone 152 converts the collected sound signal into an electric signal, which is then output by the audio circuit 150. After being received, it is converted into audio data, and then processed by the audio data output processor 160, and then sent to, for example, a user terminal through the near field communication module 110, or the audio data is output to the memory 120 for further processing.

The processor 160 is the control center of the smart headset. It uses various interfaces and lines to connect the various parts of the entire smart headset, by running or executing software programs and/or modules stored in the memory 120, and calling data stored in the memory 120 , To perform various functions of the smart headset and process data, so as to monitor the smart headset as a whole. Optionally, the processor 160 may include one or more processing units; preferably, the processor 160 may integrate an application processor and a modem processor, where the application processor mainly processes the operating system, user interface, application programs, etc. , The modem processor mainly deals with wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 160.

The smart headset 100 also includes a power source 170 (such as a battery) for supplying power to various components. Preferably, the power source can be logically connected to the processor 160 through a power management system, so that functions such as charging, discharging, and power consumption management can be managed through the power management system. .

As an example and not a limitation, the voice signal collection device may be a smart microphone. The smart microphone can process the acquired voice signal through the voice signal collection method provided in the embodiments provided in this application, and perform subsequent operations on the processed voice signal, such as playing it through a speaker, or retrieving the collected voice signal. The voice information is stored in the memory for preservation.

As an example and not a limitation, the voice signal collection device may be an immersive VR headset. The VR headgear can collect the user's voice signal through the built-in directional microphone, play the voice signal through the built-in speaker module, analyze the collected voice signal, convert it into character data, and adjust the VR headgear according to the character data The output content of the internal display unit.

In the embodiment of the present application, the execution subject of the process is an electronic device. As an example and not a limitation, the electronic device may be a smart headset, in particular, the smart headset is a true wireless (True Wireless Stereo, TWS) headset. Fig. 2 shows an implementation flowchart of the voice signal collection method provided by the first embodiment of the present application, and the details are as follows:

In S201, the first voice signal is acquired through the directional microphone of the electronic device.

In this embodiment, the smart headset may be equipped with a directional microphone, which can effectively acquire sound signals in a preset direction, that is, the directional microphone can extend to the preset sound pickup direction. The sound range is a cardioid region that extends based on the pickup direction. When the target signal to be collected is in the direction of the cardioid extension area of the pickup range, the signal quality of the collected sound signal is the best, while in the opposite direction of the pickup direction, the signal quality of the collected sound signal is poor. Fig. 3 shows the sound pickup area of a directional microphone provided by the present application. As shown in Figure 3, the pickup direction of the directional microphone is the direction corresponding to 0°. That is, the voice signal in the pickup direction has better collection instructions; the opposite direction of the pickup direction of the directional microphone is the direction corresponding to 180°, that is, the collection effect of the voice signal collected at the directionality of 180° Poor. Among them, there are multiple rings in Figure 3, each ring represents the collection sensitivity, the collection sensitivity of the center origin is -30 decibels (db), that is, in the pickup direction, the amplitude of the sound signal only needs to be greater than 30db, that is The directional microphone can be activated and the corresponding sound signal can be recorded. However, in the opposite direction of the sound pickup direction of the directional microphone, that is, 180°, at a larger sound amplitude value, for example, -2db, it is still impossible to record the voice signal. It can be seen that as the angle between the sound signal and the pickup direction of the directional microphone gradually increases, the signal collection quality will gradually decrease. When the angle between the two is 180°, that is, the propagation of the two When the direction is opposite, the signal acquisition command is the worst.

In a possible implementation, when designing the installation position of the directional microphone in the smart headset, the developer can determine the pickup of the directional microphone according to the mutual position relationship between the smart headset and the user's mouth in the use state Direction, to ensure that the sound pickup direction is consistent with the user’s utterance direction, which can improve the signal collection quality when acquiring the user’s voice signal, so as to improve the recording effect. Fig. 4 shows a schematic diagram of sound pickup in a use state of a smart headset provided by an embodiment of the present application. As shown in Figure 4, when the smart headset is in use, its plane is parallel to the plane of the user's mouth, and the user's sound direction is perpendicular to the plane of the user's mouth. Therefore, the sound pickup of the directional microphone of the smart headset is designed In the direction, the sound pickup direction can also be kept perpendicular to the corresponding plane in the use state, thereby ensuring that the sound pickup direction is consistent with the sound direction of the user's mouth, and the user's vocal signal can be collected more clearly.

In a possible implementation manner, the smart headset may be configured with a signal collection trigger threshold and a collection end judgment threshold. The built-in directional microphone of the smart headset will detect the sound amplitude of the sound signal in the current scene. If it detects that the sound amplitude is greater than the merchant’s collection trigger threshold, it will start to record the sound signal of the current environment to generate the first voice signal; During the recording process, if it is detected that the sound amplitude of the first voice signal within a certain period of time is less than the above-mentioned collection end judgment threshold, the collection of the sound signal of the current environment can be stopped. It should be noted that the value of the signal acquisition trigger threshold is greater than the aforementioned acquisition end judgment threshold. Since users often have loud voices when they start recording, the value of the signal acquisition trigger threshold can be appropriately increased to reduce the occurrence of false acquisitions; and during the recording process, due to the weak tone and pause during the person’s speech, Sometimes the sound amplitude is small, but it does not mean that the user's speech has been interrupted. Therefore, the threshold for judging the end of the collection can be appropriately lowered to ensure that the continuous speech is recorded in the same voice signal, thereby improving the continuity of the voice signal content.

In a possible implementation manner, the smart headset may be configured with a collection trigger condition, and a list of applications currently used by the user terminal can be obtained in a use state in which the smart headset is connected to the user terminal. If any application list contains preset sound collection applications, such as video call applications, voice call applications, and other recording-related applications, it can recognize that the preset collection trigger conditions are currently met, and send a voice collection to the smart headset Instructions to obtain the first voice signal through the smart headset.

In this embodiment, when the smart headset meets the preset collection conditions, the first voice signal in the current scene can be collected through the directional microphone. Optionally, after the first voice signal is obtained, the voice signal may be noise-filtered through a preset filter. In particular, the filter is a high-frequency filter to filter high-frequency noise signals. Due to the low-frequency sensitivity of the directional microphone, low-frequency noise is difficult to receive, which is equivalent to the low-cut of the directional microphone when collecting sound signals, which naturally reduces the ability of collecting frequency noise. Based on this, in order to further suppress high-frequency noise, a high-frequency filter can be connected in series after signal acquisition to filter high-frequency noise signals.

In S202, signal compensation is performed on the first voice signal according to the signal compensation curve to generate a calibration voice signal.

In this embodiment, after acquiring the first voice signal, the smart headset may perform signal compensation on the first voice signal through a preset signal compensation curve. Because the directional microphone is not friendly enough in the non-axial incident direction and the frequency response curve is not smooth enough, it will cause the first voice signal to have a low acquisition quality in the high frequency band, which is reflected in the waveform of the voice signal. The energy will be lower at mid and high frequencies. Based on this, the smart headset can compensate the collection defect of the directional microphone through the signal compensation curve after the first voice signal is collected, so as to obtain a compensated calibrated voice signal, thereby optimizing the signal collection effect of the directional microphone.

In a possible implementation manner, the manner of configuring the signal compensation curve may be: when the smart headset is shipped from the factory, the training waveform of multiple training voices at multiple different angles can be obtained through a directional microphone. Based on multiple training waveforms, the compensation coefficient corresponding to the high frequency band is determined, so that the signal compensation curve can be generated. In particular, if the corresponding collection effects of directional microphones at different sound collection angles are quite different, in this case, you can configure corresponding signal compensation for different sound collection angles according to the corresponding training waveforms at each angle. curve. In this case, when the smart headset collects the first voice signal, the current posture of the smart headset can be determined by reading the parameters fed back by the motion sensor, and based on the deviation angle of the posture from the normal use state, the user can be estimated The included angle between the sound and the sound pickup direction, and the corresponding signal compensation curve is determined based on the included angle, and signal compensation is performed on the first voice signal through the signal compensation curve.

In a possible implementation, the smart headset can establish a communication connection with the user terminal. When the smart headset collects voice signals through a directional microphone, the reference voice signal can be synchronously acquired through the microphone module built in the user terminal, and the reference voice signal can be acquired through the reference voice signal. The amplitude ratio between the low frequency band, the middle frequency band, and the high frequency band, and a signal compensation curve is generated based on the amplitude ratio. In this case, when the smart headset receives the reference voice signal fed back by the user terminal, it can detect the average amplitude of the reference voice signal, and if the average amplitude of the reference voice signal is less than the preset reference threshold, then identify the reference The voice signal is invalid voice, and the first voice signal is signal compensated through the built-in signal compensation curve. Since the user terminal can be placed in a pocket or schoolbag when using a smart headset, the signal collection effect is poor at this time. Even if the reference voice signal is collected through the user terminal, it cannot have a good performance on the first voice signal in the high frequency band. The calibration function, in this case, the signal compensation for the first voice signal can be performed through the preset signal compensation curve.

In S203, the calibration voice signal is output.

In this embodiment, the smart headset can output the calibration voice signal after signal compensation is performed on the first voice signal and the calibration voice signal is obtained. The specific output mode of the calibration voice signal may be determined according to the trigger mode of the collection operation. For example, if the user terminal is performing a video call operation and collects the user's voice signal during the call through the smart headset, after the smart headset obtains the calibration voice signal, it can send the calibration voice to the user terminal through the near field communication module Signal to send the calibrated voice signal to the call peer through the user terminal to provide the voice signal during the video call; if the user uses the voice signal to perform voice control on the smart headset or the user terminal, the smart headset is acquiring the voice signal Later, the calibration voice signal can be analyzed, the calibration voice signal can be converted into character data, the control instructions associated with the calibration voice signal can be determined based on the keywords contained in the character data, and the smart headset or user terminal can be controlled based on the control instructions.

In a possible implementation, the smart headset can be used to respond to the recording request. After the smart headset generates the calibration voice signal, the voice signal can be stored in a local memory, or through a communication module, such as a near field communication module or radio frequency. The communication module is sent to the user terminal or uploaded to the cloud server, and the calibration voice signal is saved through other terminals.

In a possible implementation manner, the smart headset may be connected with a speaker device, the smart headset sends the calibration voice signal to the speaker device through a wired link or a wireless communication link, and the calibration voice signal is output through the speaker device, thereby enabling Improve the playback effect of the voice signal, reduce the impact of noise, and make the voice signal closer to the user's original voice.

It can be seen from the above that the voice signal collection method provided by the embodiment of the present application collects the original voice signal through the directional microphone, that is, the first voice signal, and performs signal compensation on the first voice signal based on the signal compensation curve to obtain Calibrate the voice signal to improve the collection effect of high-frequency voice signals, reduce the impact of sound staining, and output the adjusted calibrated voice signal, which can improve the quality of voice collection while ensuring that the collected voice signal reduces environmental noise. Thereby enhancing the signal output effect of the microphone.

FIG. 5 shows a specific implementation flowchart of a voice signal collection method S201 provided by the second embodiment of the present application. Referring to FIG. 5, with respect to the embodiment described in FIG. 2, S201 in a voice signal collection method provided in this embodiment includes: S2011 to S2012, which are detailed as follows:

In S2011, when the first voice signal is acquired, a second voice signal corresponding to the first voice signal is collected through an omnidirectional microphone of the electronic device.

In this embodiment, the smart headset is configured with two microphone modules, which are an omnidirectional microphone and a directional microphone, respectively, and the above two microphone modules together form the microphone array of the smart headset. When the smart headset collects voice signals, it can turn on the omnidirectional microphone and the directional microphone at the same time. While collecting the first voice signal through the directional microphone, it also collects the second voice signal through the omnidirectional microphone, that is, the first voice signal. The initial recording time of the second voice signal is the same as that of the second voice signal. The two microphones collect the voice signal in the current environment at the same time, which is only collected through different microphone modules. Among them, the second voice signal is mainly used as a reference signal to adjust the first voice signal. Preferably, both the left earphone part and the right earphone part of the smart earphone can be configured with a directional microphone and an omnidirectional microphone.

Because the second voice signal picked up by the omnidirectional microphone contains much greater environmental noise than the first voice signal picked up by the directional microphone, the signal-to-noise ratio of the second voice signal is much lower than that of the first voice signal Therefore, when the voice signal is output, the first voice signal can be used as a reference for output, thereby improving the signal-to-noise ratio of the output voice signal. However, the directional microphone is not friendly enough when the sound signal is collected in the non-pickup direction and its frequency response curve is not smooth enough, which will cause the loss of the medium and high frequency signal, so that the medium and high frequency energy of the first voice signal will be more than the full The medium and high frequency energy of the second voice signal output by the directional microphone has poor fidelity. Therefore, the omnidirectional microphone can be called to collect the second voice signal at the same time to make up for the collection defect of the directional microphone in the middle and high frequency bands, thereby improving the collection efficiency.

In a possible implementation manner, since the smart earphone has a left earphone pendant and a right earphone pendant, the two earphone pendants jointly output audio to provide stereo. Therefore, during use, the user will wear the left earphone pendant and the right earphone pendant at the same time. In this case, the directional microphone can be installed on the earphone pendant on one side, and the omnidirectional microphone can be installed on the earphone pendant on the other side, so that two microphones can be installed at the same time while reducing the size of the smart headset. Especially for TWS headsets, its portability directly affects the user experience. Therefore, by deploying directional microphones and omnidirectional microphones on both sides of the headset, the space utilization of the headset can be improved. For example, when the directional microphone is installed on the left earphone hanger, the omnidirectional microphone can be installed on the right earphone hanger; conversely, when the directional microphone is installed on the right earphone hanger, the omnidirectional microphone can be installed on the right earphone hanger. Pendant.

Further, as another embodiment of the present application, when the electronic device collects voice signals, the first distance between the directional microphone and the human voice source is smaller than the omnidirectional microphone and the human voice The second distance between sound sources. Because the first voice signal collected by the directional microphone is used as the reference signal, and the second voice signal collected by the omnidirectional microphone is used as the reference signal. Therefore, in order to improve the human voice collection quality of the reference signal, the directional microphone can be set close to the user's mouth, while the omnidirectional microphone can be set relatively far away from the user's mouth.

Illustratively, FIG. 6 shows a schematic structural diagram of an existing smart headset provided by this embodiment. As shown in Figure 6, the smart headset is specifically a true wireless headset. The true wireless headset is equipped with two microphone modules, and the two microphone modules are both omnidirectional microphone modules, which are the omnidirectional microphone 1 and the omnidirectional microphone respectively. Directional microphone 2. Compared with wired earphones, TWS earphones can effectively break free from wire constraints from the perspective of user use, and can improve the user experience when wearing them. In the existing smart device technology, in addition to providing audio output services for users, headsets often need to undertake audio collection services at the same time. However, because TWS headsets are in use, they need to be hung on the user’s ears to pick up the sound, and there are no other As a result, compared with wired earphones, the user cannot directly aim the microphone on the earphone to the mouth to pick up the sound, and the TWS earphone is worn far away from the user’s mouth, which makes the pickup on the TWS earphone. The sound quality is far inferior to wired earphones that pick up sound at close range. And as shown in the figure, the existing TWS headsets are often equipped with two omnidirectional microphones, and because the omnidirectional microphones can sense changes in sound pressure in all directions, they can receive sounds from all directions with equal sensitivity. The sound pickup characteristic is a spherical shape, and the two-dimensional plan view of the pickup range of the directional microphone is 360°circle as shown in the figure. Therefore, the omnidirectional microphone is used in addition to acquiring the human voice signal sent by the user in a specific direction. It will also receive interference signals in all other directions, and both the interference signal and the human voice signal are picked up with equal sensitivity, which reduces the low signal-to-noise ratio of the human voice signal and poor sound pickup quality. On the other hand, the high-frequency component of the voice signal (greater than its frequency band above 4k) has strong directivity, that is, it is consistent with the direction of the user’s mouth. The omnidirectional microphone needs to be hung on the ear when it is in use. As a result, high-frequency signals are lost during the sound pickup process, and the sound quality is severely damaged. Moreover, the environment changes when the user uses the TWS headset, that is, the type of environmental noise is complex, which further reduces the voice collection effect of the TWS headset. It can be seen that the existing TWS headsets cannot take into account the convenience of use and guarantee the acquisition of high-quality voice signals.

Exemplarily, FIG. 7 shows a structural entity diagram of a smart headset provided by this embodiment. As shown in FIG. 7, the smart headset is specifically a true wireless headset, and two microphone modules are configured on the true wireless headset, namely, an omnidirectional microphone and a directional microphone. Compared with the existing smart headset provided in Figure 6, the microphone at the bottom of the headset is replaced by an omnidirectional microphone with a directional microphone. Among them, the pickup range of a directional microphone is a cardioid area; and the pickup range of an omnidirectional microphone is a spherical area, as shown in Figure 7. Compared with the upper end of the smart headset, the distance between the lower end and the user’s mouth is closer when the smart headset is in use. Therefore, in order to improve the sound pickup effect of the human voice signal, it is necessary to make the directional microphone and the first part of the user’s mouth. The distance is less than the second distance between the omnidirectional microphone and the user's mouth. Therefore, the omnidirectional microphone at the lower end of the existing smart headset is replaced with a directional microphone. Preferably, when designing the sound pickup direction of the directional microphone, the adaptive direction of the directional microphone can be determined according to the relative position direction between the installation position of the directional microphone and the mouth when the earphone is in use. Specifically, the installation position of the directional microphone may be the starting point and the user's mouth as the end point, a direction vector is generated, and the pickup direction of the directional microphone is determined according to the preset compensation angle and the direction vector.

In a possible implementation manner, the microphone opening of the directional microphone faces downwards, which can effectively achieve physical wind protection and reduce the impact of ambient airflow on the sound reception. Since the smart headset is in an upright state in use, the airflow will impact the surroundings of the smart headset in this state. If the microphone port is opened on either side of the cylinder, it will be affected by the airflow and pick up the vibration during the airflow. The airflow noise reduces the signal-to-noise ratio of the human voice signal. Therefore, the opening of the microphone mouth can be directed downward, thereby improving the signal-to-noise ratio of the human voice signal. It should be noted that the opening direction of the microphone port is different from the pickup direction of the directional microphone. The microphone port, that is, the sound guide hole of the microphone, refers to the physical pickup position, while the pickup direction of the directional microphone is specifically the pickup direction. In terms of effect, the direction in which the protrusion of the cardioid pickup direction is located.

In S2012, the signal compensation curve is generated according to the first voice signal and the second voice signal.

In this embodiment, after the smart headset obtains the second voice signal collected by the omnidirectional microphone, it can use the second voice signal as a reference signal and generate a signal compensation curve based on the first voice signal and the second voice signal. This ensures that the signal compensation curve matches the first voice signal currently collected, realizes the real-time generation and dynamic adjustment of the signal compensation curve, and improves the accuracy of the signal compensation curve.

In a possible implementation manner, the manner of generating the signal compensation curve may be: the smart headset may fit the corresponding real-time frequency response function according to the frequency response amplitude relationship of each frequency band in the second voice signal. And according to the deviation value between each frequency band in the first voice signal and the above-mentioned real-time frequency response function, the adjustment ratio corresponding to each frequency band is obtained, and the above-mentioned signal compensation curve is generated according to the adjustment ratio of all frequency bands.

In a possible implementation, the signal compensation curve can be generated as follows: the smart headset has a built-in reference compensation curve and a neural learning model. After the first voice signal and the second voice signal are collected each time, the dimensionality reduction pool is used. The first pooling matrix corresponding to the first voice signal and the second pooling matrix of the second voice signal are generated in the way of transformation convolution, and the frequency response characteristic values corresponding to the two pooling matrices are extracted, according to the two frequency response The difference between the eigenvalues generates adjustment parameters, imports the adjustment parameters into the above-mentioned neural learning model, performs parameter learning adjustments on the reference compensation curve, and uses the adjusted reference compensation curve as the above-mentioned signal compensation curve, thereby realizing dynamic learning The purpose of adjustment is to improve the accuracy of the signal compensation curve.

In the embodiment of the present application, the second voice signal corresponding to the first voice signal is simultaneously collected by the omnidirectional microphone, and the second voice signal is used as the reference signal to generate the signal compensation curve, so that the signal compensation curve can be generated in real time, and the signal is improved. The accuracy of the compensation curve is optimized to optimize the pickup effect of the directional microphone.

FIG. 8 shows a specific implementation flowchart of a voice signal collection method S2012 provided by the third embodiment of the present application. Referring to FIG. 8, compared with the embodiment described in FIG. 5, S2012 in a voice signal collection method provided in this embodiment includes: S801 to S804, which are detailed as follows:

Further, the generating the signal compensation curve according to the first voice signal and the second voice signal includes:

In S801, the first voice signal is converted into a first frequency domain signal, and the second voice signal is converted into a second frequency domain signal.

In this embodiment, the first speech signal and the second speech signal are specifically time-domain signals. Therefore, the first speech signal and the second speech signal can be converted from the time-domain signal to the frequency-domain signal through the Fourier transform algorithm. Thus, the first frequency domain signal corresponding to the first voice signal and the second frequency response curve corresponding to the second voice signal can be obtained. Wherein, the aforementioned frequency response curve is specifically the corresponding relationship between frequency and gain.

In S802, the human voice speech frequency band is extracted according to the first frequency domain signal.

In this embodiment, because the directional microphone has a better collection effect when collecting human voice signals, and can suppress environmental noise in multiple directions, the signal-to-noise ratio of the human voice signal is high, so by comparing the first voice signal The corresponding first frequency domain signal is used to identify the human voice spectrum. The smart headset can determine the signal segment of the human voice range in the first frequency domain signal, and recognize the human voice frequency according to the gain corresponding to each frequency value in the signal segment, and compose all the recognized human voice frequencies into the human voice speech Frequency band.

In a possible implementation, the smart headset may be set with an effective gain threshold. The smart headset will determine whether the gain of each frequency value in the human voice frequency band is greater than the effective gain threshold in the above intercepted signal segment. If it is, the frequency value is recognized as the human voice frequency; on the contrary, if any frequency value in the signal segment is If the gain is less than or equal to the effective gain threshold, it is recognized that there is no effective information at the frequency value in the human voice signal collected this time, and it is recognized as a noise frequency.

In S803, filter noise frequency bands other than the human voice speech frequency band in the second frequency domain signal to obtain a filtered signal.

In this embodiment, the smart headset uses the first frequency domain signal as a reference to determine the human voice voice frequency band contained in the voice signal collected this time. However, due to the lack of gain of the directional microphone in the middle and high frequency bands, it needs to pass the whole The directional microphone performs gain compensation. The omnidirectional microphone has the same collection sensitivity in all directions, so noise will be acquired in each frequency band, so it is difficult to identify the human voice frequency band through the second voice signal, so the human voice voice can be extracted from the first voice signal Frequency band, so as to make up for the defect of low accuracy of human voice recognition in the second voice signal.

In this embodiment, after the smart headset determines the human voice frequency band through the first frequency response signal, it can mark each frequency value in the human voice speech frequency band in the second frequency response signal, thereby identifying that the second frequency response signal contains The vocal frequency of the human voice is recognized as the noise frequency except for the human voice frequency, and the noise frequency is filtered out, thereby generating a filtered signal after the noise signal is eliminated. The filtered signal only contains the human voice frequency, and the gain value of the human voice frequency in each frequency band is recorded.

In S804, the signal compensation curve is generated through the filtered signal.

In this embodiment, since the omnidirectional microphone does not have mid- and high-frequency gain loss, the filter signal generated by the second frequency response signal can be used to determine the relative proportions between different frequency values, and the relative proportions can be determined based on the relative proportions. The gain compensation coefficient in the middle and high frequency bands can generate a signal compensation curve to eliminate the problem of innate acquisition distortion of the directional microphone.

In this embodiment, signal compensation is performed on the first voice signal through the signal compensation curve, which can increase the high frequency gain, so as to compensate for the high frequency caused by the use position of the directional microphone away from the axis of the user’s speaking direction. The sound energy of the signal is attenuated. For example, if the high-frequency radiation direction of the voice signal is in the direction indicated by the front of the user’s mouth, there will be a loss of high-frequency signals during the pickup process, so the frequency response of the signal compensation curve increases in the frequency band of 2-7kHz. The gain compensation of 8～10dB can increase the peak point at 5kHz. The gain compensation through the signal compensation curve can also reduce the high frequency loss caused by the masking of the face to a certain extent, and further improve the signal acquisition quality.

In the embodiment of the present application, by performing frequency domain conversion on the first voice signal and the second voice signal, the signal compensation curve is generated according to the previously collected defects of the directional microphone and the omnidirectional microphone, which can improve the signal compensation curve. Accuracy,.

FIG. 9 shows a specific implementation flowchart of a voice signal collection method S202 provided by the fourth embodiment of the present application. Referring to FIG. 9, compared with the embodiment described in FIG. 2, S202 in a voice signal collection method provided in this embodiment includes: S2021 to S2023, and the details are as follows:

Further, the performing signal compensation on the first voice signal according to a signal compensation curve to generate a calibration voice signal includes:

In S2021, an original frequency response curve of the first voice signal is generated.

In this embodiment, the smart headset can convert the first voice signal into a frequency domain signal through Fourier transform, so that the gain value corresponding to each frequency value in the first voice signal can be determined, and the original frequency response curve can be obtained. The original frequency response curve is used to express the correlation between the frequency value and the gain.

In S2022, using the signal compensation curve as a reference, each frequency value in the original frequency response curve is fitted to obtain a frequency response calibration curve.

In this embodiment, the signal compensation curve can compensate the gain loss of the directional microphone in the high frequency band during the acquisition process. Therefore, after the original frequency response curve of the first voice signal is generated, the signal compensation curve can be The waveform trend of the gain value of the frequency band adjusts each frequency in the original frequency response curve to generate a frequency response calibration curve.

In a possible implementation manner, the smart headset may determine the gain ratio between the various frequency values according to the signal compensation curve, and fit the original frequency response curve based on the above gain ratio to obtain the frequency response calibration curve. For example, in the signal compensation curve, the gain ratio between the low frequency band and the mid frequency band is 80%, while in the original frequency response curve, the gain ratio between the low frequency band and the mid frequency band is 120%, that is, the gain ratio of the mid frequency band is missing. The gain of each frequency value in the mid-range can be increased, so that the ratio is adjusted to 80%, which is consistent with the gain ratio in the signal compensation curve.

In S2023, the calibration voice signal is generated based on the frequency response reference curve.

In this embodiment, an inverse Fourier transform is performed on the frequency response calibration curve, and the frequency domain signal is reconverted into a time domain signal to obtain the above-mentioned calibration voice signal.

In the embodiment of the present application, the first voice signal is converted into the original frequency response curve, so that the original frequency response curve is curve-fitted based on the signal compensation curve, thereby compensating for the frequency response loss of the mid-to-high frequency band, thereby improving the directional microphone Signal acquisition instructions.

FIG. 10 shows a specific implementation flowchart of a voice signal collection method provided by the fifth embodiment of the present application. Referring to FIG. 10, with respect to any one of the embodiments described in FIG. 2, FIG. 5, FIG. 8 and FIG. 9, the method for collecting a voice signal provided by this embodiment is described in that the first voice is acquired through the directional microphone of the electronic device. Before the signal, it also includes: S1001～S1003, the details are as follows:

Further, before the acquiring the first voice signal through the directional microphone of the electronic device, the method further includes:

In S1001, obtain the current environmental sound intensity.

In this embodiment, before the smart headset needs to perform the voice signal collection operation, it can detect the ambient sound intensity in the current scene and determine the current recording environment. If the current environmental sound intensity is relatively small, it means that the current environment is relatively quiet, and the environmental noise is low, so you can perform the operation of S1002; on the contrary, if the current environmental sound intensity is high, it may indicate that the current environment is relatively noisy , The environmental noise is large, and the environmental noise has a large impact on the recording. At this time, you can perform the operation of S1003.

In this embodiment, the smart headset is equipped with a directional microphone and an omnidirectional microphone, and two different microphones can correspond to recording operations in different scenes.

In S1002, if the environmental sound intensity is less than a preset sound intensity threshold, a third voice signal is acquired through an omnidirectional microphone, and the third voice signal is output.

In this embodiment, when it is detected that the current environmental sound intensity is less than the preset sound intensity threshold, it means that the current environmental noise is less. At this time, the third voice signal can be obtained through the omnidirectional microphone built into the smart headset. Since the omnidirectional microphone has the same sensitivity to the sound pickup in all directions, it will obtain the feedback noise signal in all directions. In the case of weak environmental sound, even if the noise signal in all directions is collected, the signal of the human voice signal The noise ratio is also less affected, so an omnidirectional microphone can be used to collect the human voice signal, that is, the above-mentioned third voice signal, and directly output the third voice signal.

In S1003, if the environmental sound is stronger than or equal to the preset sound intensity threshold, the first voice signal is acquired through the directional microphone of the electronic device.

In this embodiment, when it is detected that the current environmental sound is stronger than or equal to the preset sound intensity threshold, it means that the current environmental noise is relatively large. At this time, a directional microphone needs to be used to obtain human voice signals to reduce environmental noise. For the impact of human voice signal acquisition, thereby improving the signal-to-noise ratio of the human voice signal, the operation of acquiring the first voice signal through the directional microphone of the electronic device is performed, and the subsequent signal compensation means is used to signal the first voice signal Compensation to reduce the gain loss of the directional microphone in the middle and high frequency bands.

In the embodiment of the present application, the smart headset is equipped with an omnidirectional microphone and a directional microphone, and the corresponding microphone is used to collect voice signals in different scenarios, which can improve signal collection instructions while reducing unnecessary compensation operations. , Thereby improving the collection efficiency of voice signals and reducing the amount of calculation of the smart headset.

FIG. 11 shows a specific implementation flowchart of a voice signal collection method S201 provided by the sixth embodiment of the present application. Referring to FIG. 11, with respect to any one of the embodiments described in FIG. 2, FIG. 5, FIG. 8 and FIG. 9, a voice signal collection method S201 provided in this embodiment includes: S1101 to S1103, which are detailed as follows:

Further, the acquiring the first voice signal through the directional microphone of the electronic device includes:

In S1101, the directional microphone is controlled by the transmission component of the electronic device to collect training voice signals at multiple preset angles.

In this embodiment, the installation position of the directional microphone of the smart headset is equipped with a transmission component, which can control the microphone to rotate in a preset plane with a preset step length, so as to realize collection at multiple preset angles. voice signal. Since the directional microphone has a certain pick-up direction, the pick-up direction of the directional microphone will also change as it rotates with the transmission components. In the process of collecting multi-terminal dialogues, you can adjust the directional microphone The direction of the sound pickup, and thus the collection effect of the voice signal. Based on this, before the smart headset collects voice signals, the directional microphone can be rotated to multiple preset angles through the transmission component, so as to adjust the pickup direction of the directional microphone, and obtain training voice signals collected at various preset angles . It should be noted that the smart headset can collect training voice signals of a preset duration at multiple preset angles. For example, the collection duration of the training voice signal may be 0.5s.

In S1102, the target angle is determined according to the average amplitude of the human voice frequency band in each of the training speech signals.

In this embodiment, the smart headset converts each training voice signal into a frequency domain signal, and extracts the human voice frequency band from each training frequency domain signal according to the human vocal frequency range, and calculates the average amplitude corresponding to each human voice voice frequency band value. The preset angle with the largest average amplitude is selected as the target angle. In this angle, the average increase in the human voice frequency band is larger, that is, the human voice signal collection effect is better, and it has a higher signal-to-noise ratio. Therefore, the preset angle can be used as the target angle to record the user at the target angle Voice signal.

In S1103, the transmission component is controlled to adjust the directional microphone to the target angle, and the first voice signal is acquired at the target angle.

In this embodiment, after the smart headset determines the target angle, the directional microphone can be adjusted to the target angle through the transmission component, and the first voice signal is collected at the target angle. Especially in the scene of multi-terminal dialogue, the position of the user occurs Change, determine the target angle through the transmission component, can further improve the collection effect of the human voice signal.

In the embodiment of the present application, by configuring a transmission component in the smart headset, the transmission component is adjusted to the pickup direction of the directional microphone, thereby determining the best pickup angle and improving the pickup effect.

It should be understood that the size of the sequence number of each step in the foregoing embodiment does not mean the order of execution, and the execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation to the implementation process of the embodiment of the present application.

Corresponding to the voice signal collection method described in the above embodiment, FIG. 12 shows a structural block diagram of a voice signal collection device provided in an embodiment of the present application. For ease of description, only the information related to the embodiment of the present application is shown. section.

Referring to Figure 12, the voice signal collection device includes:

The first voice signal acquiring unit 121 is configured to acquire the first voice signal through the directional microphone of the electronic device;

The voice signal compensation unit 122 is configured to perform signal compensation on the first voice signal according to the signal compensation curve to generate a calibration voice signal;

The calibration voice signal output unit 123 is used to output the calibration voice signal.

Optionally, the first voice signal acquiring unit 121 includes:

A second voice signal acquiring unit, configured to collect a second voice signal corresponding to the first voice signal through an omnidirectional microphone of the electronic device when acquiring the first voice signal;

The signal compensation curve generating unit is configured to generate the signal compensation curve according to the first voice signal and the second voice signal.

Optionally, when the electronic device collects the voice signal, the first distance between the directional microphone and the human sound source is smaller than the second distance between the omnidirectional microphone and the human sound source .

Optionally, the signal compensation curve generating unit includes:

A frequency domain signal conversion unit, configured to convert the first voice signal into a first frequency domain signal, and convert the second voice signal into a second frequency domain signal;

The human voice frequency band extraction unit is configured to extract the human voice frequency band according to the first frequency domain signal;

A filtered signal generating unit, configured to filter noise frequency bands in the second frequency domain signal except for the human voice speech frequency band to obtain a filtered signal;

The filtered signal conversion unit is configured to generate the signal compensation curve through the filtered signal.

Optionally, the voice signal compensation unit 122 further includes:

An original frequency response curve generating unit, configured to generate the original frequency response curve of the first voice signal;

A frequency response calibration curve output unit, configured to fit each frequency value in the original frequency response curve with the signal compensation curve as a reference to obtain a frequency response calibration curve;

The calibration voice signal generating unit is configured to generate the calibration voice signal based on the frequency response reference curve.

Optionally, the voice signal collection device further includes:

The ambient sound intensity acquisition unit is used to acquire the current ambient sound intensity;

The third voice signal acquisition unit is configured to obtain a third voice signal through an omnidirectional microphone and output the third voice signal if the environmental sound intensity is less than a preset sound intensity threshold;

The first voice signal collection trigger unit is configured to perform the acquisition of the first voice signal through the directional microphone of the electronic device if the environmental sound is stronger than or equal to the preset sound intensity threshold.

Optionally, the first voice signal acquiring unit 121 includes:

A training voice signal collection unit, configured to control the directional microphone to collect training voice signals at multiple preset angles through the transmission component of the electronic device;

The target angle recognition unit is configured to determine the target angle according to the average amplitude of the human voice frequency band in each of the training speech signals;

The target angle adjustment unit is configured to control the transmission component to adjust the directional microphone to the target angle, and obtain the first voice signal at the target angle.

Therefore, the voice signal collection device provided in the embodiment of the present application can also collect the original voice signal, that is, the first voice signal, through the directional microphone, and perform signal compensation on the first voice signal based on the signal compensation curve to obtain the calibrated voice signal. In order to improve the collection effect of high-frequency voice signals, reduce the impact of sound staining, and output the adjusted calibrated voice signal, it can improve the quality of voice collection on the basis of ensuring that the collected voice signal reduces environmental noise, thereby enhancing the microphone The signal output effect.

FIG. 13 is a schematic structural diagram of an electronic device provided by an embodiment of the application. As shown in FIG. 13, the electronic device 13 of this embodiment includes: at least one processor 130 (only one is shown in FIG. 13), a processor, a memory 131, and a processor stored in the memory 131 and capable of being processed in the at least one processor The computer program 132 running on the processor 130, when the processor 130 executes the computer program 132, implements the steps in any of the foregoing embodiments of the voice signal collection method.

The electronic device 13 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud electronic device. The electronic device may include, but is not limited to, a processor 130 and a memory 131. Those skilled in the art can understand that FIG. 13 is only an example of the electronic device 13 and does not constitute a limitation on the electronic device 13. It may include more or less components than those shown in the figure, or a combination of certain components, or different components. , For example, can also include input and output devices, network access devices, and so on.

The so-called processor 130 may be a central processing unit (Central Processing Unit, CPU), and the processor 130 may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSPs), and application specific integrated circuits (Application Specific Integrated Circuits). , ASIC), ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.

The memory 131 may be an internal storage unit of the electronic device 13 in some embodiments, such as a hard disk or a memory of the electronic device 13. In other embodiments, the memory 131 may also be an external storage device of the voice signal collection device 13, such as a plug-in hard disk or a smart memory card (Smart Media Card, SMC) equipped on the electronic device 13 , Secure Digital (SD) card, Flash Card, etc. Further, the memory 131 may also include both an internal storage unit of the electronic device 13 and an external storage device. The memory 131 is used to store an operating system, an application program, a boot loader (BootLoader), data, and other programs, such as the program code of the computer program. The memory 131 can also be used to temporarily store data that has been output or will be output.

It should be noted that the information interaction and execution process between the above-mentioned devices/units are based on the same concept as the method embodiment of this application, and its specific functions and technical effects can be found in the method embodiment section for details. I won't repeat it here.

Those skilled in the art can clearly understand that, for the convenience and conciseness of description, only the division of the above functional units and modules is used as an example. In practical applications, the above functions can be allocated to different functional units and modules as needed. Module completion, that is, the internal structure of the device is divided into different functional units or modules to complete all or part of the functions described above. The functional units and modules in the embodiments can be integrated into one processing unit, or each unit can exist alone physically, or two or more units can be integrated into one unit. The above-mentioned integrated units can be hardware-based Formal realization can also be realized in the form of a software functional unit. In addition, the specific names of the functional units and modules are only for the convenience of distinguishing each other, and are not used to limit the protection scope of this application. For the specific working process of the units and modules in the foregoing system, reference may be made to the corresponding process in the foregoing method embodiment, which will not be repeated here.

An embodiment of the present application also provides a network device, which includes: at least one processor, a memory, and a computer program stored in the memory and running on the at least one processor, and the processor executes The computer program implements the steps in any of the foregoing method embodiments.

The embodiments of the present application also provide a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the steps in each of the foregoing method embodiments can be realized.

The embodiments of the present application provide a computer program product. When the computer program product runs on a mobile terminal, the steps in the foregoing method embodiments can be realized when the mobile terminal is executed.

If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the implementation of all or part of the processes in the above-mentioned embodiment methods in the present application can be accomplished by instructing relevant hardware through a computer program. The computer program can be stored in a computer-readable storage medium. The computer program can be stored in a computer-readable storage medium. When executed by the processor, the steps of the foregoing method embodiments can be implemented. Wherein, the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file, or some intermediate forms. The computer-readable medium may at least include: any entity or device capable of carrying the computer program code to the photographing device/terminal device, recording medium, computer memory, read-only memory (ROM, Read-Only Memory), and random access memory (RAM, Random Access Memory), electric carrier signal, telecommunications signal and software distribution medium. For example, U disk, mobile hard disk, floppy disk or CD-ROM, etc. In some jurisdictions, according to legislation and patent practices, computer-readable media cannot be electrical carrier signals and telecommunication signals.

In the above-mentioned embodiments, the description of each embodiment has its own focus. For parts that are not described in detail or recorded in an embodiment, reference may be made to related descriptions of other embodiments.

A person of ordinary skill in the art may realize that the units and algorithm steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.

In the embodiments provided in this application, it should be understood that the disclosed apparatus/network equipment and method may be implemented in other ways. For example, the device/network device embodiments described above are only illustrative. For example, the division of the modules or units is only a logical function division, and there may be other divisions in actual implementation, such as multiple units. Or components can be combined or integrated into another system, or some features can be omitted or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that they can still implement the foregoing The technical solutions recorded in the examples are modified, or some of the technical features are equivalently replaced; these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the application, and should be included in Within the scope of protection of this application.

Finally, it should be noted that the above are only specific implementations of this application, but the scope of protection of this application is not limited to this. Any changes or substitutions within the technical scope disclosed in this application shall be covered by this application. Within the scope of protection applied for. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims

A method for collecting voice signals, which is characterized in that it includes:

Acquiring the first voice signal through the directional microphone of the electronic device;

Performing signal compensation on the first voice signal according to the signal compensation curve to generate a calibration voice signal;

The calibration voice signal is output.
The collection method according to claim 1, wherein the acquiring the first voice signal through the directional microphone of the electronic device comprises:

When acquiring the first voice signal, collecting a second voice signal corresponding to the first voice signal through an omnidirectional microphone of the electronic device;

The signal compensation curve is generated according to the first voice signal and the second voice signal.
The collection method according to claim 2, wherein when the electronic device collects voice signals, the first distance between the directional microphone and the human sound source is smaller than the distance between the omnidirectional microphone and the omnidirectional microphone. The second distance between human sound sources.
The collection method according to claim 2, wherein the generating the signal compensation curve according to the first voice signal and the second voice signal comprises:

Converting the first voice signal into a first frequency domain signal, and converting the second voice signal into a second frequency domain signal;

Extracting a human voice speech frequency band according to the first frequency domain signal;

Filtering noise frequency bands in the second frequency domain signal other than the human voice speech frequency band to obtain a filtered signal;

The signal compensation curve is generated by the filtered signal.
The collection method according to claim 1, wherein the performing signal compensation on the first voice signal according to a signal compensation curve to generate a calibration voice signal comprises:

Generating an original frequency response curve of the first voice signal;

Using the signal compensation curve as a reference, fitting each frequency value in the original frequency response curve to obtain a frequency response calibration curve;

The calibration voice signal is generated based on the frequency response reference curve.
The collection method according to any one of claims 1 to 5, wherein before the first voice signal is acquired through the directional microphone of the electronic device, the method further comprises:

Get the current environmental sound intensity;

If the environmental sound intensity is less than the preset sound intensity threshold, acquiring a third voice signal through an omnidirectional microphone, and outputting the third voice signal;

If the ambient sound is stronger than or equal to the preset sound intensity threshold, performing the acquisition of the first voice signal through the directional microphone of the electronic device.
The collection method according to any one of claims 1 to 5, wherein the acquiring the first voice signal through the directional microphone of the electronic device comprises:

Controlling the directional microphone to collect training voice signals at multiple preset angles through the transmission component of the electronic device;

Determine the target angle according to the average amplitude of the human voice frequency band in each of the training speech signals;

The transmission component is controlled to adjust the directional microphone to the target angle, and the first voice signal is acquired at the target angle.
A voice signal collection device, which is characterized in that it comprises:

The first voice signal acquiring unit is configured to acquire the first voice signal through the directional microphone of the electronic device;

A voice signal compensation unit, configured to perform signal compensation on the first voice signal according to a signal compensation curve to generate a calibration voice signal;

The calibration voice signal output unit is used to output the calibration voice signal.
An electronic device, comprising a memory, a processor, and a computer program stored in the memory and capable of running on the processor, wherein the processor executes the computer program as claimed in claims 1 to 7. The method of any one.
A computer-readable storage medium storing a computer program, wherein the computer program implements the method according to any one of claims 1 to 7 when the computer program is executed by a processor.