CN109151211B

CN109151211B - Voice processing method and device and electronic equipment

Info

Publication number: CN109151211B
Application number: CN201811163039.XA
Authority: CN
Inventors: 龚金华
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2018-09-30
Filing date: 2018-09-30
Publication date: 2022-01-11
Anticipated expiration: 2038-09-30
Also published as: CN109151211A

Abstract

The embodiment of the application discloses a voice processing method and device and electronic equipment. The method comprises the following steps: when the electronic equipment is in a conversation state, the electronic equipment acquires a first voice signal and a second voice signal which are received in the same time period; comparing the quality of the first voice signal with the quality of the second voice signal; and taking the voice signal with better quality in the first voice signal and the second voice signal as a main input signal of noise reduction processing, and taking the voice signal with poorer quality as an auxiliary input signal of the noise reduction processing. According to the method, in the process of carrying out communication based on the earphones, the electronic equipment can carry out quality comparison on the voice signals with the same content acquired by the first earphones and the second earphones, so that in the subsequent noise reduction processing process, one with higher quality is used as a main input signal for noise reduction processing, the voice noise in the communication process based on the earphones is reduced, the definition of uplink voice is improved, and further the user experience is improved.

Description

Voice processing method and device and electronic equipment

Technical Field

The present application relates to the field of electronic device technologies, and in particular, to a voice processing method and apparatus, and an electronic device.

Background

With the popularization of more external audio output devices, users of electronic devices have more choices to transmit audio to be played to the external audio output devices for playing. For example, when the electronic device can transmit audio to a headset, the audio is played by using the headset. However, when the earphone is lost, an effective retrieving method is lacked, which greatly affects the user experience.

Disclosure of Invention

In view of the foregoing problems, the present application provides a method and an apparatus for processing speech, and an electronic device, so as to improve the foregoing problems.

In a first aspect, the present application provides a speech processing method applied to an electronic device, where the method includes: when the electronic equipment is in a conversation state, the electronic equipment acquires a first voice signal and a second voice signal which are received in the same time period, wherein the first voice signal is acquired and sent by a first earphone, and the second voice signal is acquired and sent by a second earphone; comparing the quality of the first voice signal with the quality of the second voice signal; taking the voice signal with better quality in the first voice signal and the second voice signal as a main input signal of noise reduction processing, and taking the voice signal with poorer quality as an auxiliary input signal of the noise reduction processing; and sending the voice signal output by the noise reduction processing as the uplink voice of the call.

In a second aspect, the present application provides a speech processing apparatus, operable on an electronic device, the apparatus comprising: the electronic equipment comprises a voice signal acquisition unit, a first earphone and a second earphone, wherein the voice signal acquisition unit is used for acquiring a first voice signal and a second voice signal which are received in the same time period when the electronic equipment is in a conversation state, the first voice signal is acquired and sent by the first earphone, and the second voice signal is acquired and sent by the second earphone; the voice signal comparison unit is used for comparing the quality of the first voice signal and the second voice signal; the noise reduction processing unit is used for taking one voice signal with better quality in the first voice signal and the second voice signal as a main input signal of noise reduction processing and taking one voice signal with poorer quality as an auxiliary input signal of the noise reduction processing; and the signal sending unit is used for sending the voice signal output by the noise reduction processing as the uplink voice of the call.

In a third aspect, the present application provides an electronic device comprising one or more processors and a memory; one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the methods described above.

In a fourth aspect, the present application provides a computer-readable storage medium having a program code stored therein, wherein the program code performs the above method when running.

The application provides a speech processing method, device and electronic equipment works as the electronic equipment is in the conversation state, acquires the first speech signal that the first earphone that receives in same time period gathered and sent and the second speech signal that the second earphone gathered and sent, compares first speech signal with the quality of second speech signal first, then will first speech signal with a better speech signal of quality in the second speech signal is as the main input signal of noise reduction processing, regards as a speech signal of relatively poor quality as the vice input signal of noise reduction processing, will pass through the speech signal of noise reduction processing output is regarded as the ascending pronunciation of conversation is sent. Therefore, in the process of conversation based on the earphone, the electronic equipment can compare the quality of the voice signals with the same content acquired by the first earphone and the second earphone, so that in the subsequent denoising process, the voice signal with higher quality is used as the main input signal of the denoising process, the voice based on the earphone conversation process is denoised, the definition of uplink voice is improved, and the user experience is further improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 shows a schematic diagram of an electronic device interacting with a headset according to the present application;

FIG. 2 illustrates a schematic diagram of another electronic device and headset interaction presented herein;

FIG. 3 is a flow chart illustrating a method of speech processing proposed by the present application;

FIG. 4 is a flow chart illustrating another speech processing method proposed by the present application;

FIG. 5 is a flow chart illustrating yet another speech processing method proposed by the present application;

FIG. 6 is a schematic diagram illustrating a method for determining a source of a voice in another voice processing method proposed in the present application;

FIG. 7 is a schematic diagram illustrating another method for determining the source of a sound in another speech processing method proposed in the present application;

fig. 8 is a block diagram illustrating an audio processing apparatus according to the present application;

fig. 9 is a block diagram showing the structure of another audio processing apparatus proposed in the present application;

fig. 10 is a block diagram showing a structure of still another audio processing apparatus proposed in the present application;

fig. 11 shows a block diagram of an electronic device proposed in the present application;

fig. 12 is a storage unit according to an embodiment of the present application, configured to store or carry program code for implementing a speech processing method according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

With the increase of the types of audio playing devices and the improvement of playing sound effects, more users can select to output audio played by electronic devices such as mobile phones and tablet computers to the audio playing devices for playing. Common audio playing devices include a vehicle-mounted music player, a sound box, an earphone, and the like. Wherein, the earphone comprises a wired earphone or a wireless earphone.

Take an earphone as an example. If the earphone is a wired earphone, after the wired earphone is connected with the electronic device, the electronic device can transmit the audio to the wired earphone through a physical line of the wired earphone, so that the wired earphone can play the audio conveniently. Furthermore, if the earphone is a wireless earphone, the wireless earphone proposes a communication link according to a wireless communication protocol supported by the wireless earphone and the electronic device in advance, and then the electronic device transmits the audio to the wireless earphone through the communication link so that the wireless earphone can play the audio. For example, as shown in fig. 1, in one case, the wireless headset includes a first headset 110 and a second headset 120, the electronic device 100 suggests a communication link with the first headset 110 and the second headset 120, respectively, based on the aforementioned wireless communication protocol. Furthermore, as shown in fig. 2, the electronic device 100 may directly suggest a communication link based on the aforementioned wireless communication protocol for the first earphone 100, and then suggest a communication link based on the aforementioned wireless communication protocol for the first earphone 110 and the second earphone 120. The wireless communication protocol may include a Wlan protocol, a bluetooth protocol, a ZigBee protocol, or the like.

The inventor finds in research that when the first earphone and the second earphone are provided with the sound collection modules to collect voice and transmit the voice to the electronic equipment, noise in collected environment can be transmitted to the electronic equipment, and uplink voice sent by the electronic equipment can carry more environmental noise to influence user experience. Therefore, the inventor proposes a speech processing method, a speech processing device and an electronic device which can reduce environmental noise and improve user quality in the application.

Embodiments of the present application will be described in detail below with reference to the accompanying drawings.

Referring to fig. 3, a speech processing method provided by the present application is applied to an electronic device, and the method includes:

step S110: when the electronic equipment is in a conversation state, the electronic equipment acquires a first voice signal and a second voice signal which are received in the same time period, the first voice signal is collected and sent by a first earphone, and the second voice signal is collected and sent by a second earphone.

It can be understood that after the first earphone and the second earphone are both worn, the voice signals sent by the user are received synchronously. Then, after the first earphone and the second earphone receive the voice signal, the received voice signal can be directly sent to the electronic device. Then the first voice signal and the second voice signal received by the electronic device in the same time period are signals with the same voice content.

As one mode, whether the earphone is in a wearing state may be detected by an infrared sensor provided to the earphone. It should be noted that, when the earphone is worn on the ear of a person, some areas may be blocked, and in this case, in a manner that the infrared sensor is disposed in the area that is blocked after the earphone is in the wearing state, whether the infrared signal emitted by the infrared sensor is blocked is determined by detecting the returned state value of the infrared sensor, so as to determine whether the earphone is in the wearing state or the non-wearing state. It will be appreciated that the headset is determined to be in a worn state when the returned state value indicates that the infrared signal is occluded and in an unworn state when the returned state value indicates that the infrared signal is not occluded.

Then, as one way, the headset may detect the content represented by the state value returned by the infrared sensor at a set period, and transmit the content represented by the acquired state value returned by the infrared sensor to the electronic device at the set period, so that the electronic device records whether the headset is in a wearing state in real time. Alternatively, the electronic device may be configured by building locally a text file in which a variable, e.g., headset _ status, is configured. If the electronic device recognizes that the content representation earphone returned by the earphone is in a wearing state, 1 may be assigned to the variable headset _ status, and when the electronic device recognizes that the content representation earphone returned by the earphone is in an unworn state, 0 may be assigned to the variable headset _ status. It is possible to determine whether the headset is in a wearing state by detecting the value of the variable headset _ status in the text file after the electronic device detects that the audio playback is performed through the headset.

It should be noted that, in addition to the aforesaid detection of whether the earphone is in the wearing state by the infrared sensor, other elements may be provided to determine whether the earphone is in the wearing state. For example, a component may be disposed at a portion of the earphone that contacts with the skin of the human body after the earphone is worn, and a resistance value of the component may change after the component contacts with the skin of the human body (for example, a thermistor). Accordingly, in this manner, after the earphone determines whether the earphone is in the wearing state through the current change of the path, the result of the detection can be transmitted to the electronic device.

Step S120: and comparing the quality of the first voice signal with the quality of the second voice signal.

It should be noted that, although the first earphone and the second earphone are both collecting the voice content sent by the user during the call, due to the orientation problem of the user when speaking or the quality problem of the communication quality between the first earphone or the second earphone and the electronic device, although the voice content carried by each earphone is the same, the quality of the first voice signal and the second voice signal received by the electronic device may be different. For example, the voice content carried by each of the first voice signal and the second voice signal is "ABCDEFG", but the sound intensity of the voice content carried by the first voice signal may be greater than the sound intensity of the voice content carried by the second voice signal.

Then, as a way, the step of comparing the quality of the first voice signal and the second voice signal comprises: acquiring sound parameters of the first voice signal and the second voice signal which respectively carry voice contents; and comparing the quality of the first voice signal and the second voice signal based on the sound parameters.

Wherein the step of comparing the quality of the first voice signal and the second voice signal based on the sound parameter comprises: if the sound parameter comprises sound intensity, comparing the sound intensity of the first voice signal with the sound intensity of the second voice signal, and taking the voice signal with higher intensity as the voice signal with better quality; if the sound parameter comprises sound continuity, comparing the sound continuity of the first voice signal with the sound continuity of the second voice signal, and taking the voice signal with higher continuity as the voice signal with better quality, wherein the fewer break points in the middle of the voice content are, the higher the sound continuity is.

It can be understood that the sound content carried by the signal with higher sound intensity can be better identified by the call object. Then it is advantageous to use the signal with higher sound intensity as a speech signal with better quality to obtain a speech signal with clearer speech content. Furthermore, if there are discontinuous places in the voice content carried in the voice signal, which may cause the user to hear the voice intermittently, it is advantageous to use the voice signal with higher voice continuity as the voice signal with better quality to obtain the voice signal with more continuous carried voice content.

Step S130: and taking the voice signal with better quality in the first voice signal and the second voice signal as a main input signal of noise reduction processing, and taking the voice signal with poorer quality as an auxiliary input signal of the noise reduction processing.

In the process of noise reduction, the main input signal and the sub input signal are subjected to a difference operation to cancel out the noise signals carried by the main input signal and the sub input signal. For example, the auxiliary input signal may be subtracted from the main input signal to obtain a speech signal to be transmitted. And because the noise signals received by the first earphone and the second earphone are approximate, the noise signals are already offset in the voice signal to be transmitted obtained by subtracting the auxiliary input signal from the main input signal.

Based on the foregoing method, if a speech signal with poor quality is directly used as the main input signal and a speech signal with better quality is subtracted, the phase of the obtained speech signal is opposite to that of the actual speech signal, so that a phase change is performed before transmission. In order to avoid this problem, in the present application, one of the first speech signal and the second speech signal with better quality is used as a main input signal for noise reduction processing, and one speech signal with poorer quality is used as a sub-input signal for the noise reduction processing.

Step S140: and sending the voice signal output by the noise reduction processing as the uplink voice of the call.

The application provides a speech processing method works as electronic equipment is in the conversation state, acquires the first speech signal that the first earphone that receives in same time period gathered and sent and the second speech signal that the second earphone gathered and sent, compares first speech signal with the quality of second speech signal first, then will first speech signal with the better speech signal of quality in the second speech signal is as the main input signal of noise reduction processing, regards as the speech signal of relatively poor quality as the vice input signal of noise reduction processing, will pass through the speech signal of noise reduction processing output is regarded as the ascending pronunciation of conversation are sent. Therefore, in the process of conversation based on the earphone, the electronic equipment can compare the quality of the voice signals with the same content acquired by the first earphone and the second earphone, so that in the subsequent denoising process, the voice signal with higher quality is used as the main input signal of the denoising process, the voice based on the earphone conversation process is denoised, the definition of uplink voice is improved, and the user experience is further improved.

Referring to fig. 4, a speech processing method provided by the present application is applied to an electronic device, and the method includes:

step S210: when the electronic equipment is in a conversation state, the electronic equipment acquires a first voice signal and a second voice signal which are received in the same time period, the first voice signal is collected and sent by a first earphone, and the second voice signal is collected and sent by a second earphone.

When a user is engaged in a voice call, there may be other sound sources around the user. For example, there may be other users around the user who are speaking, or other audio playback devices playing audio. For these other sound sources, the sound intensity of the sound emitted by the other sound sources may be greatly different when the sound is transmitted to the first earphone and the second earphone.

Step S220: and acquiring a first sound intensity of the voice content carried by the first voice signal.

Step S230: and acquiring second sound intensity of the sound content carried by the second sound signal.

Step S240: and comparing the first sound intensity with the second sound intensity.

Step S250: and if the difference value between the first sound intensity and the second sound intensity is not smaller than the set threshold, discarding the first voice signal and the second voice signal.

It should be noted that, after the first earphone and the second earphone are both in the wearing state, although the intensity of the sound collected by the first earphone and the second earphone may be different, the intensity is relatively similar. In this case, if the difference between the first sound intensity and the second sound intensity is smaller than the set threshold value, the electronic device may determine that the sound sources of the first sound signal and the second sound signal are both users, and if the difference between the first sound intensity and the second sound intensity is not smaller than the set threshold value, it may determine that more sounds emitted by other sound sources are carried in the first sound signal and the second sound signal.

Then, as one approach, the electronic device may directly discard the first speech signal and the second speech signal in this case. Alternatively, as another mode, a voice signal having a sound intensity smaller than the set threshold is separated from each of the first voice signal and the second voice signal as a voice signal actually acquired by the first headphone and a voice signal actually acquired by the second headphone. It is understood that the actually collected voice signal is a voice signal uttered by the user.

Step S260: and if the difference value between the first sound intensity and the second sound intensity is smaller than a set threshold value, comparing the quality of the first voice signal with the quality of the second voice signal.

Step S270: and taking the voice signal with better quality in the first voice signal and the second voice signal as a main input signal of noise reduction processing, and taking the voice signal with poorer quality as an auxiliary input signal of the noise reduction processing.

Step S280: and sending the voice signal output by the noise reduction processing as the uplink voice of the call.

The application provides a speech processing method, is working as electronic equipment is in the conversation state, acquires the first speech signal that the first earphone that receives in same time period gathered and sent and the second speech signal that the second earphone gathered and sent, acquires after the intensity difference between first speech signal and the second speech signal is less than the setting value, compares again first speech signal with the quality of second speech signal, then will a better speech signal of quality in first speech signal and the second speech signal is as the main input signal of noise reduction processing, regards a speech signal that the quality is relatively poor as the vice input signal of noise reduction processing, will pass through again the speech signal of noise reduction processing output is regarded as the ascending pronunciation of conversation is sent. Therefore, in the process of conversation based on the earphone, the electronic equipment can compare the quality of the voice signals with the same content acquired by the first earphone and the second earphone, so that in the subsequent denoising process, the voice signal with higher quality is used as the main input signal of the denoising process, the voice based on the earphone conversation process is denoised, the definition of uplink voice is improved, and the user experience is further improved.

Referring to fig. 5, a speech processing method provided by the present application is applied to an electronic device, and the method includes:

step S310: when the electronic equipment is in a conversation state, the electronic equipment acquires a first voice signal and a second voice signal which are received in the same time period, the first voice signal is collected and sent by a first earphone, and the second voice signal is collected and sent by a second earphone.

Step S320: and obtaining the source direction of the voice signal collected by the first earphone based on the voice signal of the first direction and the voice signal of the second direction of the first earphone.

As one mode, the sound collection module may be disposed in a plurality of directions of the first earphone and the second earphone, so that sounds in a plurality of directions may be collected simultaneously. Moreover, it can be understood that, for the sound emitted from the same sound source, the sound intensity collected by the sound collection modules arranged in multiple directions is different. For example, as shown in fig. 6, the relative positions of the first earphone 110 and the second earphone 120 and the user are shown in fig. 6, it is understood that the first earphone 110 and the second earphone 120 are worn at the ears of the user in actual use, and the relative positions between them are only identified in fig. 6 and are not used to illustrate the actual positions of the first earphone 110 and the second earphone 120. The first side 111, the second side 112, and the third side 113 of the first earphone 110 are all provided with a sound collection module, and correspondingly, the first side 121, the second side 122, and the third side 123 of the second earphone 120 are all provided with a sound collection module.

In this case, if the user starts to send out the voice signal, for the first earphone 110, the sound intensity of the sound collected by the sound collection module disposed on the third side 113 is greater than the sound intensity of the sound collected by the sound collection module disposed on the first side 111, and the sound intensity of the sound collected by the sound collection module disposed on the first side 111 is greater than the sound intensity of the sound collected by the sound collection module disposed on the second side 112. Based on the foregoing manner, the first earphone may determine the source direction of the sound according to the sound intensity of the sound collected by the plurality of sound collection modules.

If the sound environment around the first earphone and the second earphone is complex, the sound intensity of the sound collected by the plurality of sound collection modules may be similar, which is not favorable for identifying the sound source. Then, as a mode, the first earphone may identify the sound content carried by the sound signals collected by the plurality of earphones, and further determine the sound intensity for a plurality of sound signals (collected by the plurality of sound collection modules) carrying the same sound content.

Step S330: and obtaining the source direction of the voice signal collected by the second earphone based on the voice signal of the second earphone in the first direction and the voice signal of the second direction.

Similar to the way the first earphone acquires the direction from which the sound originates, the second earphone may also determine the direction from which the collected sound originates in the same way.

Step S340: and judging whether the source direction of the voice signal collected by the first earphone and the source direction of the voice signal collected by the second earphone both accord with set conditions.

As a mode, first earphone can only transmit the speech signal that the user sent for electronic equipment to filter the speech signal that other directions transmitted in advance, and then the data bulk that needs the processing when alleviateing follow-up noise reduction processing promotes holistic speech transmission speed. In this process, the electronic apparatus may recognize the sound intensity of the sound collected by the sound collection module provided to each of the first side 111, the second side 112, and the third side 113 shown in fig. 6.

As one mode, the setting conditions include that the sound intensity of the sound collected by the sound collection module disposed on the third side 113 is greater than the sound intensity of the sound collected by the sound collection module disposed on the first side 111, and the sound intensity of the sound collected by the sound collection module disposed on the first side 111 is greater than the sound intensity of the sound collected by the sound collection module disposed on the second side 112. If the set condition is met, the first earphone can judge the source of the collected sound and the sound production of the user.

Alternatively, the setting condition includes that the direction of the source of the voice signal collected by the first earphone is opposite to the direction of the source of the voice signal collected by the second earphone. After the first earphone and the second earphone are in a wearing state, if the electronic equipment detects that the source direction of the voice signal collected by the first earphone is opposite to the source direction of the voice signal collected by the second earphone, the collected voice signal is represented as being sent by a user.

For example, as shown in fig. 7, if there is a sound generated by another sound source transmitted to the first earphone 110, the sound intensity of the sound collected by the sound collection module disposed on the first side 111 is inevitably greater than the sound intensity of the sound collected by the sound collection module disposed on the other side, and the first earphone can determine that the sound comes from another sound source that the user has accidentally.

Similarly, the second earpiece may also be configured to determine that the captured sound originated from the user utterance.

Step S350: and if the first voice signal and the second voice signal are not in accordance with the set condition, discarding the first voice signal and the second voice signal.

Step S360: and if the first voice signal and the second voice signal accord with the set condition, comparing the quality of the first voice signal with the quality of the second voice signal.

Step S370: taking the voice signal with better quality in the first voice signal and the second voice signal as a main input signal of noise reduction processing, and taking the voice signal with poorer quality as an auxiliary input signal of the noise reduction processing;

step S380: and sending the voice signal output by the noise reduction processing as the uplink voice of the call.

It should be noted that, the judgment of the sound source may be performed by the first earphone and the second earphone, or the first earphone and the second earphone may directly transmit the collected voice signal to the electronic device, and the judgment is performed by the electronic device.

According to the voice processing method, in the process of conversation based on the earphone, the electronic equipment can compare the quality of the voice signals with the same content acquired by the first earphone and the second earphone, so that in the subsequent noise reduction processing process, the voice signal with higher quality is used as the main input signal of the noise reduction processing, the voice noise reduction based on the earphone conversation process is realized, the definition of uplink voice is improved, and further the user experience is improved. Moreover, the subsequent quality comparison can be carried out only when the voice signal is identified to be from the user, so that the voice emitted by other sound sources can be filtered, and the processing efficiency is improved.

Referring to fig. 8, a speech processing apparatus 400 provided by the present application is operated in an electronic device, the apparatus 400 includes: a voice signal acquisition unit 410, a voice signal comparison unit 420, a noise reduction processing unit 430, and a signal transmission unit 440.

The voice signal acquiring unit 410 is configured to, when the electronic device is in a call state, acquire a first voice signal and a second voice signal that are received within the same time period by the electronic device, where the first voice signal is acquired and sent by a first earphone, and the second voice signal is acquired and sent by a second earphone.

A voice signal comparing unit 420, configured to compare the quality of the first voice signal and the second voice signal.

As a mode, the voice signal comparing unit 420 is specifically configured to obtain sound parameters of voice contents carried by the first voice signal and the second voice signal respectively; and comparing the quality of the first voice signal and the second voice signal based on the sound parameters.

Specifically, if the sound parameter includes sound intensity, the sound signal comparison unit 420 compares the sound intensities of the first and second sound signals, and uses the sound signal with higher intensity as the sound signal with better quality; if the sound parameter comprises sound continuity, comparing the sound continuity of the first voice signal with the sound continuity of the second voice signal, and taking the voice signal with higher continuity as the voice signal with better quality, wherein the fewer break points in the middle of the voice content are, the higher the sound continuity is.

The noise reduction processing unit 430 is configured to use one of the first voice signal and the second voice signal with better quality as a main input signal for noise reduction processing, and use one of the voice signals with poorer quality as a sub-input signal for the noise reduction processing.

A signal sending unit 440, configured to send the voice signal output through the noise reduction processing as an uplink voice of the call.

Referring to fig. 9, a speech processing apparatus 500 provided by the present application is operated on an electronic device, where the apparatus 500 includes: a voice signal acquiring unit 510, a sound intensity acquiring unit 520, a voice signal comparing unit 530, a noise reduction processing unit 540, and a signal transmitting unit 550.

A voice signal obtaining unit 510, configured to, when the electronic device is in a call state, obtain, by the electronic device, a first voice signal and a second voice signal that are received within the same time period, where the first voice signal is collected and sent by a first earphone, and the second voice signal is collected and sent by a second earphone.

A sound intensity obtaining unit 520, configured to obtain a first sound intensity of a voice content carried by the first voice signal.

The sound intensity obtaining unit 520 is further configured to obtain a second sound intensity of the voice content carried by the second voice signal.

A voice signal comparing unit 530, configured to compare the first sound intensity and the second sound intensity.

A voice signal comparing unit 530, configured to discard the first voice signal and the second voice signal if the difference between the first sound intensity and the second sound intensity is not smaller than the set threshold.

The voice signal comparing unit 530 is further configured to compare the quality of the first voice signal and the quality of the second voice signal if the difference between the first sound intensity and the second sound intensity is smaller than a set threshold.

The noise reduction processing unit 540 is configured to use one of the first voice signal and the second voice signal with better quality as a main input signal for noise reduction processing, and use one of the voice signals with poorer quality as a sub-input signal for the noise reduction processing.

A signal sending unit 550, configured to send the voice signal output through the noise reduction processing as the uplink voice of the call.

Referring to fig. 10, a speech processing apparatus 600 provided by the present application is operated on an electronic device, the apparatus 600 includes: the device comprises a voice signal acquisition unit 610, a signal source determination unit 620, a signal judgment unit 630, a voice signal comparison unit 640, a noise reduction processing unit 650 and a signal sending unit 660.

The voice signal acquiring unit 610 is configured to, when the electronic device is in a call state, acquire a first voice signal and a second voice signal received in the same time period by the electronic device, where the first voice signal is acquired and sent by a first earphone, and the second voice signal is acquired and sent by a second earphone.

A signal source determining unit 620, configured to obtain a source direction of the voice signal acquired by the first earphone based on the voice signal of the first direction and the voice signal of the second direction of the first earphone.

The signal source determining unit 620 is further configured to obtain a source direction of the voice signal acquired by the second earphone based on the voice signal of the second earphone in the first direction and the voice signal of the second direction.

The signal determining unit 630 is configured to determine whether the source direction of the voice signal collected by the first earphone and the source direction of the voice signal collected by the second earphone both conform to a set condition.

The signal determining unit 630 is further configured to discard the first voice signal and the second voice signal if the set condition is not met.

The voice signal comparing unit 640 is configured to compare the quality of the first voice signal and the quality of the second voice signal if the set condition is met.

A noise reduction processing unit 650, configured to use one of the first voice signal and the second voice signal with better quality as a main input signal for noise reduction processing, and use one of the voice signals with poorer quality as an auxiliary input signal for the noise reduction processing;

a signal sending unit 660, configured to send the voice signal output through the noise reduction processing as the uplink voice of the call.

It should be noted that, as will be clear to those skilled in the art, for convenience and brevity of description, the specific working processes of the above-described apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present application, the coupling between the modules may be electrical, mechanical or other type of coupling. In addition, functional modules in the embodiments of the present application may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.

To sum up, the application provides a speech processing method, device and electronic equipment works as electronic equipment is in the conversation state, acquires the first speech signal that the first earphone that receives in same time cycle was gathered and was sent and the second speech signal that the second earphone was gathered and was sent, compares earlier first speech signal with the quality of second speech signal, then will first speech signal with a better speech signal of quality is as the main input signal of noise reduction processing in the second speech signal, regards as a speech signal of poor quality as the vice input signal of noise reduction processing will pass through again the speech signal of noise reduction processing output is regarded as the ascending pronunciation of conversation are sent. Therefore, in the process of conversation based on the earphone, the electronic equipment can compare the quality of the voice signals with the same content acquired by the first earphone and the second earphone, so that in the subsequent denoising process, the voice signal with higher quality is used as the main input signal of the denoising process, the voice based on the earphone conversation process is denoised, the definition of uplink voice is improved, and the user experience is further improved.

An electronic device provided by the present application will be described below with reference to fig. 11.

Referring to fig. 11, based on the foregoing speech processing method and apparatus, another electronic device 200 capable of executing the speech processing method is further provided in the embodiment of the present application. The electronic device 200 includes one or more processors 102 (only one shown), a memory 104, a wireless module 106, an audio module 108, and a gesture detection module 110 coupled to each other. The memory 104 stores programs that can execute the content of the foregoing embodiments, and the processor 102 can execute the programs stored in the memory 104.

Processor 102 may include one or more processing cores, among other things. The processor 102 interfaces with various components throughout the electronic device 100 using various interfaces and circuitry to perform various functions of the electronic device 100 and process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 104 and invoking data stored in the memory 104. Alternatively, the processor 102 may be implemented in hardware using at least one of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 102 may integrate one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. The CPU mainly processes an operating system, a user interface, a target application program and the like; the GPU is used for rendering and drawing display content; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 102, but may be implemented by a communication chip.

The Memory 104 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). The memory 104 may be used to store instructions, programs, code sets, or instruction sets. The memory 104 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing various method embodiments described below, and the like. The storage data area may also store data created by the electronic device 200 in use (such as the aforementioned text documents), and the like.

The wireless module 106 is configured to receive and transmit electromagnetic waves, and achieve interconversion between the electromagnetic waves and the electrical signals, so as to communicate with a communication network or other devices, for example, an audio playing device. The wireless module 106 may include various existing circuit elements for performing these functions, such as an antenna, a radio frequency transceiver, a digital signal processor, an encryption/decryption chip, a Subscriber Identity Module (SIM) card, memory, and so forth. The wireless module 106 may communicate with various networks, such as the internet, an intranet, a wireless network, or with other electronic devices via a wireless network. The wireless network may comprise a cellular telephone network, a wireless local area network, or a metropolitan area network. The wireless network described above may use various communication standards, protocols, and technologies, including but not limited to WLAN protocols and bluetooth protocols, and may even include those that are not currently under development.

Further, the audio module 108 may be used to process audio output by applications running in the electronic device 200. The gesture detection module 110 may include an acceleration sensor, a gyroscope, and the like to detect a gesture or acceleration of the electronic device 200.

The electronic device 200 may be the electronic device 100 described above. The first earphone 110 or the second earphone 120 may be the same as described above.

Referring to fig. 12, a block diagram of a computer-readable storage medium according to an embodiment of the present application is shown. The computer-readable medium 800 has stored therein a program code that can be called by a processor to execute the method described in the above-described method embodiments.

The computer-readable storage medium 800 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. Alternatively, the computer-readable storage medium 800 includes a non-volatile computer-readable storage medium. The computer readable storage medium 800 has storage space for program code 810 to perform any of the method steps of the method described above. The program code can be read from or written to one or more computer program products. The program code 810 may be compressed, for example, in a suitable form.

For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not necessarily depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A speech processing method, applied to an electronic device, the method comprising:

when the electronic equipment is in a conversation state, the electronic equipment acquires a first voice signal and a second voice signal which are received in the same time period, wherein the first voice signal is acquired and sent by a first earphone, and the second voice signal is acquired and sent by a second earphone;

acquiring a first sound intensity of a voice content carried by the first voice signal;

acquiring second sound intensity of the sound content carried by the second sound signal;

comparing the first sound intensity with the second sound intensity;

if the difference between the first sound intensity and the second sound intensity is smaller than a set threshold, determining that the sound sources of the first voice signal and the second voice signal are both users, and comparing the quality of the first voice signal and the quality of the second voice signal, where the first voice signal and the second voice signal each include a voice signal in a first direction and a voice signal in a second direction, the voice signal in the first direction is collected by a microphone of the first earphone and the second earphone facing the first direction, and the second voice signal in the second direction is collected by a microphone of the first earphone and the second earphone facing the second direction; the step of comparing the quality of the first voice signal and the quality of the second voice signal further comprises: obtaining a source direction of the voice signal acquired by the first earphone based on the voice signal of the first direction and the voice signal of the second direction of the first earphone; obtaining a source direction of the voice signal acquired by the second earphone based on the voice signal of the second earphone in the first direction and the voice signal of the second direction; judging whether the source direction of the voice signal collected by the first earphone and the source direction of the voice signal collected by the second earphone both accord with set conditions; if the first voice signal and the second voice signal are consistent with a set condition, the step of comparing the quality of the first voice signal with the quality of the second voice signal is executed; the step of comparing the quality of the first voice signal and the second voice signal comprises: acquiring sound parameters of the first voice signal and the second voice signal which respectively carry voice contents; comparing the quality of the first voice signal and the second voice signal based on the sound parameters; if the sound parameter comprises sound continuity, comparing the sound continuity of the first voice signal with the sound continuity of the second voice signal, and taking the voice signal with higher sound continuity as the voice signal with better quality, wherein the fewer break points in the middle of the voice content are, the higher the sound continuity is; if the difference value between the first sound intensity and the second sound intensity is not smaller than the set threshold, judging that the first voice signal and the second voice signal carry sounds emitted by more other sound sources, and discarding the first voice signal and the second voice signal;

taking the voice signal with better quality in the first voice signal and the second voice signal as a main input signal of noise reduction processing, and taking the voice signal with poorer quality as an auxiliary input signal of the noise reduction processing;

and sending the voice signal output by the noise reduction processing as the uplink voice of the call.

2. The method according to claim 1, wherein the setting conditions include:

the direction of the source of the voice signal collected by the first earphone is opposite to the direction of the source of the voice signal collected by the second earphone.

3. The method of claim 1, further comprising:

and if the source direction of the voice signal collected by the first earphone and the source direction of the voice signal collected by the second earphone do not accord with a set condition, discarding the first voice signal and the second voice signal.

4. A speech processing apparatus, operable in an electronic device, the apparatus comprising:

the electronic equipment comprises a voice signal acquisition unit, a first earphone and a second earphone, wherein the voice signal acquisition unit is used for acquiring a first voice signal and a second voice signal which are received in the same time period when the electronic equipment is in a conversation state, the first voice signal is acquired and sent by the first earphone, and the second voice signal is acquired and sent by the second earphone;

the sound intensity acquisition unit is used for acquiring first sound intensity of the voice content carried by the first voice signal;

the sound intensity acquiring unit is further configured to acquire a second sound intensity of the voice content carried by the second voice signal;

a voice signal comparison unit for comparing the first sound intensity with the second sound intensity;

the voice signal comparison unit is further configured to determine that sound sources of the first voice signal and the second voice signal are both users if a difference between the first sound intensity and the second sound intensity is smaller than a set threshold, compare qualities of the first voice signal and the second voice signal, where the first voice signal and the second voice signal each include a voice signal in a first direction and a voice signal in a second direction, the voice signal in the first direction is collected by a microphone of the first headset and the microphone of the second headset facing the first direction, and the second voice signal in the second direction is collected by a microphone of the first headset and the microphone of the second headset facing the second direction; the step of comparing the quality of the first voice signal and the quality of the second voice signal further comprises: obtaining a source direction of the voice signal acquired by the first earphone based on the voice signal of the first direction and the voice signal of the second direction of the first earphone; obtaining a source direction of the voice signal acquired by the second earphone based on the voice signal of the second earphone in the first direction and the voice signal of the second direction; judging whether the source direction of the voice signal collected by the first earphone and the source direction of the voice signal collected by the second earphone both accord with set conditions; if the first voice signal and the second voice signal meet the set condition, comparing the quality of the first voice signal with the quality of the second voice signal; the voice signal comparison unit is specifically configured to acquire sound parameters of respective voice contents carried by the first voice signal and the second voice signal; comparing the quality of the first voice signal and the second voice signal based on the sound parameters; if the sound parameter comprises sound continuity, comparing the sound continuity of the first voice signal with the sound continuity of the second voice signal, and taking the voice signal with higher sound continuity as the voice signal with better quality, wherein the fewer break points in the middle of the voice content are, the higher the sound continuity is; if the difference value between the first sound intensity and the second sound intensity is not smaller than the set threshold, judging that the first voice signal and the second voice signal carry sounds emitted by more other sound sources, and discarding the first voice signal and the second voice signal;

the noise reduction processing unit is used for taking one voice signal with better quality in the first voice signal and the second voice signal as a main input signal of noise reduction processing and taking one voice signal with poorer quality as an auxiliary input signal of the noise reduction processing;

and the signal sending unit is used for sending the voice signal output by the noise reduction processing as the uplink voice of the call.

5. An electronic device comprising one or more processors and memory;

one or more programs stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the method of any of claims 1-3.

6. A computer-readable storage medium, having program code stored therein, wherein the program code when executed by a processor performs the method of any of claims 1-3.