CN111988704B

CN111988704B - Sound signal processing method, device and storage medium

Info

Publication number: CN111988704B
Application number: CN201910424866.8A
Authority: CN
Inventors: 杨依珍; 马宁; 陈宇
Original assignee: Beijing Xiaomi Mobile Software Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd
Priority date: 2019-05-21
Filing date: 2019-05-21
Publication date: 2021-10-22
Anticipated expiration: 2039-05-21
Also published as: CN111988704A

Abstract

The disclosure relates to a sound signal processing method, a sound signal processing device and a storage medium, and belongs to the technical field of terminals. The method can obtain the noise signal which can represent the environmental noise of the local terminal by separating the signal of the voice signal of the local terminal, and obtain the voice signal of the opposite terminal of the call by separating the signal, so that the gain parameter applied during the amplification processing can be self-adaptively adjusted based on the actual situation of the noise signal and the voice signal, thereby leading the user of the local terminal to hear the voice signal with higher quality through the telephone receiver, improving the definition of the voice signal, and also improving the call quality.

Description

Sound signal processing method, device and storage medium

Technical Field

The present disclosure relates to the field of terminal technologies, and in particular, to a method and an apparatus for processing a sound signal, and a storage medium.

Background

With the development of terminal technology, people have higher and higher requirements on mobile terminals, and especially for the basic functions of the mobile terminals, namely, the call functions, how to improve the call quality is an important research direction for improving the performance of the mobile terminals.

At present, some mobile terminals adopt a dual-microphone noise reduction technology to improve the call quality, where the dual-microphone noise reduction technology is: the mobile terminal is provided with a small hole for collecting ambient noise at the top part besides the bottom conversation microphone, and the small hole is used as a second microphone of the mobile terminal, so that signals can be collected simultaneously through the bottom and top microphones. The two signals are simultaneously input into the microphone processor, and the difference amplifier in the microphone processor subtracts the two signals and then amplifies the two signals, so that the environmental noise signal is effectively eliminated, and the definition of the human voice signal is greatly improved.

However, when the above-mentioned dual-microphone noise reduction technique is applied, once the mouth is a certain distance away from the microphone, the difference between the collected voice signal and the ambient noise signal is not large, so that after the subsequent processing, not only the definition of the voice signal cannot be improved, but also the situation of reducing the communication quality may be caused by the subtraction of the wrong signals.

Disclosure of Invention

The present disclosure provides a sound signal processing method, apparatus and storage medium, which can improve the definition of a human voice signal to improve the communication quality.

According to an aspect provided by the present disclosure, there is provided a sound signal processing method applied to a first terminal, the method including:

in the process of communication with a second terminal, performing signal separation on a first sound signal collected by a microphone of the first terminal to obtain a noise signal of the first sound signal;

when a second sound signal sent by the second terminal is received, performing signal separation on the second sound signal to obtain a human voice signal of the second sound signal;

determining a target gain parameter according to the noise signal of the first sound signal, the human sound signal of the second sound signal and the target volume;

and amplifying the human voice signal of the second sound signal according to the target gain parameter.

In one possible implementation manner, the signal separation of the first sound signal collected by the microphone of the first terminal to obtain the noise signal of the first sound signal includes:

inputting the first sound signal into a signal separation model, and identifying a noise signal in the first sound signal through the signal separation model according to the sound characteristics of the human sound signal and the sound characteristics of the noise signal;

the signal separation of the second sound signal to obtain the human sound signal of the second sound signal includes:

and inputting the second sound signal into the signal separation model, and identifying the human sound signal in the second sound signal through the signal separation model according to the sound characteristics of the human sound signal and the sound characteristics of the noise signal.

In one possible implementation, the determining a target gain parameter according to the noise signal of the first sound signal, the human sound signal of the second sound signal, and the target volume includes:

determining a first volume difference between a human voice signal of the second sound signal and a noise signal of the first sound signal;

determining a second volume difference between the target volume and the first volume difference;

and determining the target gain parameter according to the second volume difference.

In one possible implementation, the determining a first volume difference between a human voice signal of the second sound signal and a noise signal of the first sound signal includes:

determining a first average volume of a noise signal of the first sound signal over a first time period;

determining a second average volume of a human voice signal of the second sound signal over the first time period;

taking a volume difference between the first average volume and the second average volume as the first volume difference.

determining a maximum volume of a noise signal of the first sound signal over a second time period;

determining the minimum volume of the human voice signal of the second sound signal in the second time period;

taking a volume difference between the maximum volume and the minimum volume as the first volume difference.

In one possible implementation, the target volume is determined based on a user's hearing habits of the first terminal.

In one possible implementation, the determining of the target volume includes:

acquiring the volume of a user of the first terminal when the user uses any audio playing application, and analyzing according to the acquired volume to obtain the target volume;

and acquiring the distance between the user ear of the first terminal and the first terminal in the historical call process and the volume set in the historical call process, and analyzing according to the acquired distance and volume to obtain the target volume.

In one possible implementation, the first terminal includes at least two microphones.

In another aspect provided by the embodiments of the present disclosure, there is provided a sound signal processing apparatus applied to a first terminal, the apparatus including:

the signal separation module is configured to perform signal separation on a first sound signal collected by a microphone of the first terminal in a conversation process with a second terminal to obtain a noise signal of the first sound signal;

the signal separation module is further configured to perform signal separation on a second sound signal sent by the second terminal when the second sound signal is received, so as to obtain a human voice signal of the second sound signal;

a parameter determination module configured to determine a target gain parameter according to a noise signal of the first sound signal, a human sound signal of the second sound signal, and a target volume;

and the amplifying module is configured to amplify the human voice signal of the second sound signal according to the target gain parameter.

In one possible implementation, the signal separation module is configured to input the first sound signal into a signal separation model, and identify a noise signal in the first sound signal according to the sound characteristics of the human sound signal and the sound characteristics of the noise signal through the signal separation model;

the signal separation module is further configured to input the second sound signals into the signal separation model, and identify human sound signals in the second sound signals according to the sound characteristics of the human sound signals and the sound characteristics of the noise signals through the signal separation model.

In one possible implementation, the parameter determination module includes:

a first determination unit configured to determine a first volume difference between a human voice signal of the second sound signal and a noise signal of the first sound signal;

a second determination unit configured to determine a second volume difference between the target volume and the first volume difference;

a third determination unit configured to determine the target gain parameter according to the second volume difference.

In one possible implementation, the first determining unit is configured to determine a first average volume of a noise signal of the first sound signal over a first time period; determining a second average volume of a human voice signal of the second sound signal over the first time period; taking a volume difference between the first average volume and the second average volume as the first volume difference.

In one possible implementation, the first determining unit is configured to determine a maximum volume of a noise signal of the first sound signal within a second time period;

In one possible implementation, the apparatus further includes a target volume determination module configured to perform any one of the following steps:

In another aspect provided by the embodiments of the present disclosure, there is provided a sound signal processing apparatus including:

one or more processors;

volatile or non-volatile memory for storing the one or more processor-executable instructions;

wherein the one or more processors are configured to execute the instructions to implement the sound signal processing method as described above.

In another aspect provided by embodiments of the present disclosure, there is provided a computer-readable storage medium, wherein instructions, when executed by a processor of a terminal, enable the terminal to perform the sound signal processing method as described above.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: the signal separation is carried out on the sound signal of the local terminal to obtain a noise signal which can represent the environmental noise of the local terminal, and the signal separation is carried out to obtain the voice signal of the opposite terminal of the call, so that the gain parameter applied during the amplification processing can be adaptively adjusted based on the actual situation of the noise signal and the voice signal, and a user at the local terminal can hear the voice signal with higher quality through a telephone receiver, the definition of the voice signal can be improved, and the call quality is also improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

Fig. 1 is a flow chart illustrating a method of sound signal processing according to an exemplary embodiment.

Fig. 2 is a flow chart illustrating a method of sound signal processing according to an exemplary embodiment.

Fig. 3 is a diagram illustrating a structure of a terminal according to an exemplary embodiment.

Fig. 4 is a block diagram illustrating an acoustic signal processing apparatus according to an exemplary embodiment.

Fig. 5 is a block diagram illustrating an acoustic signal processing apparatus 500 according to an exemplary embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

Fig. 1 is a flowchart illustrating a sound signal processing method according to an exemplary embodiment, where the sound signal processing method is used in any terminal, as shown in fig. 1, and the embodiment of the present disclosure is described by taking a first terminal as one of call terminals as an example, and includes the following steps.

In step 101, during a call with a second terminal, a first voice signal collected by a microphone of the first terminal is subjected to signal separation to obtain a noise signal of the first voice signal.

In step 102, when receiving the second sound signal sent by the second terminal, performing signal separation on the second sound signal to obtain a human voice signal of the second sound signal.

In step 103, a target gain parameter is determined according to the noise signal of the first sound signal, the human voice signal of the second sound signal, and a target volume.

In step 104, the human voice signal of the second sound signal is amplified according to the target gain parameter.

The method provided by the embodiment of the disclosure can obtain the noise signal capable of representing the environmental noise of the home terminal by performing signal separation on the voice signal of the home terminal, and obtain the voice signal of the opposite end of the call by performing signal separation, so that the gain parameter applied during amplification processing can be adaptively adjusted based on the actual situation of the noise signal and the voice signal, so that a user at the home terminal can listen to the voice signal with higher quality through a telephone receiver, the definition of the voice signal can be improved, and the call quality is also improved.

In one possible implementation, the determining of the target volume includes:

Fig. 2 is a flowchart illustrating a sound signal processing method according to an exemplary embodiment, and as shown in fig. 2, the sound signal processing method is still described by taking a processing procedure of the first terminal as an example, and includes the following steps.

In step 201, a first terminal inputs a signal separation model to a first sound signal collected by a microphone of the first terminal during a call with a second terminal, and identifies a noise signal and a human sound signal in the first sound signal according to a sound characteristic of the human sound signal and a sound characteristic of the noise signal through the signal separation model.

In step 201, the first sound signal may be signal-separated by a signal separation model obtained through pre-training, so as to obtain a noise signal and a human voice signal of the first sound signal.

It should be noted that the signal separation model may be obtained by deep learning based on a plurality of sound signals including a human voice signal and a noise signal, the signal separation model grasps a sound characteristic that a human voice is distinguished from an environmental noise, and the signal separation model may be obtained by specifically training a convolutional neural network. In the training process, the voiceprint characteristics of the human voice signal and the noise signal can be extracted to serve as the characteristics for training, and the model obtained through the training can be used for customizing the size of the environmental noise at will, even the environmental noise is larger than the human voice.

In step 202, when receiving a second sound signal sent by the second terminal, the first terminal inputs the second sound signal into the signal separation model, and identifies a human sound signal and a noise signal in the second sound signal according to the sound characteristics of the human sound signal and the sound characteristics of the noise signal through the signal separation model.

The step 202 is a process of separating the voice signal from the opposite communication terminal to obtain the human voice signal and the noise signal of the voice signal.

The signal separation model applied in the signal separation and the signal separation model in the step 201 may be the same signal separation model, and of course, the signal separation model applied in the step 201 and the signal separation model in the step 202 may also be different models, that is, the signal separation model in the step 201 may be a signal separation model that continuously acquires the vocal signal of the user of the first terminal for updating based on the use process of the user of the first terminal, so as to improve the accuracy of the vocal noise separation, and the signal separation model applied in the step 202 may be a general signal separation model, so as to avoid the inaccurate identification caused by the too much biased identification.

In step 203, the first terminal determines a first volume difference between the human voice signal of the second sound signal and the noise signal of the first sound signal.

Wherein the first volume difference may represent a portion of the human voice signal that is effective for the user's hearing. The existence of the noise signal may directly affect the auditory perception of the part of the human voice signal, so that the influence of the noise signal on the auditory perception can be eliminated by determining the volume difference.

The first volume difference determination process of step 203 may include any one of the following implementations:

in a first implementation, a first volume difference is determined based on an average volume over a period of time, thereby achieving a more average difference value to account for the difference between the population.

In one possible implementation, the process may include: determining a first average volume of a noise signal of the first sound signal over a first time period; determining a second average volume of a human voice signal of the second sound signal over the first time period; taking a volume difference between the first average volume and the second average volume as the first volume difference. The first time period may be any time period of the call process, for example, a fixed time span with the current time point as the ending time point.

A second implementation determines the first volume difference based on the maximum and minimum volumes over a period of time, thereby achieving a difference that can cover the worst case to ensure a better auditory effect.

In one possible implementation, the process may include: determining a maximum volume of a noise signal of the first sound signal over a second time period; determining the minimum volume of the human voice signal of the second sound signal in the second time period; taking a volume difference between the maximum volume and the minimum volume as the first volume difference. The second time period may be any time period of the call process, for example, a fixed time span with the current time point as the ending time point.

In step 204, the first terminal determines a second volume difference between the target volume and the first volume difference.

In the embodiment of the present disclosure, the target volume refers to a volume adapted to the hearing habits of the user of the first terminal. That is, the target volume is determined based on the user's hearing habits of the first terminal.

The hearing habits of the user may include a volume setting habit when the user listens to any audio playing application at ordinary times or a listening volume of the user during a call, and accordingly, the determination process of the target volume may include any one of the following implementation manners:

in a first implementation manner, the volume of the user of the first terminal when using any audio playing application is obtained, and the target volume is obtained by analyzing according to the obtained volume.

The first terminal may record the volume of its playing during the audio playing process of any audio playing application, and perform modeling based on the recorded volume to determine the target volume, for example, the target volume may be an average value of the recorded volumes, or the target volume may be a volume that meets the trend of the volume change.

In a second implementation manner, the distance between the user ear and the terminal in the history call process of the user of the first terminal and the volume in the history call process are obtained, and the target volume is obtained by analyzing according to the obtained distance and volume.

The first terminal may detect a distance between the ear of the user and the terminal during a history call, obtain a volume of the receiver during the call, and perform modeling based on the recorded distance and volume to determine the target volume, for example, may perform curve fitting based on the distance and volume to determine a correspondence between the distance and the volume, and based on the correspondence, may determine the target volume according to the current distance between the first terminal and the ear. In one implementation, if the first terminal is currently in a fit state with the ear of the user, the corresponding target volume may be obtained based on the correspondence.

In step 205, the first terminal determines the target gain parameter according to the second volume difference.

The noise signal of the first sound signal can reflect the environmental noise of the current call environment of the first terminal, and the existence of the environmental noise may affect the hearing of the user, so the call quality can be measured based on the volume difference between the human sound signal of the opposite communication terminal and the noise signal of the home terminal. In order to make the call quality more suitable for the actual use requirement of the user of the first terminal, it may be determined what gain parameter should be used for the second audio signal to perform the amplification process according to the target volume that can represent the hearing habits of the user. The target gain parameter refers to a parameter of the amplification processor, and the target gain parameter is a gain parameter required for achieving a target call quality.

The above steps 203 to 205 are processes of determining a target gain parameter by the first terminal according to the noise signal of the first sound signal, the human sound signal of the second sound signal and the target volume. For the whole call process, the above gain parameter determination process can be performed every preset time length to ensure that the gain parameter can be changed in real time according to the real-time change condition of the environmental noise and the received sound signal, so as to achieve a consistent auditory effect.

In step 206, the first terminal amplifies the human voice signal of the second sound signal according to the target gain parameter.

In step 207, the first terminal plays the amplified second sound signal.

After the first terminal performs amplification processing on the human voice signal, the amplified signal may be transmitted to a receiver of the terminal, including but not limited to a speaker SPK and an earphone (e.g., REV, headset), so as to implement adaptive noise reduction according to the magnitude of the ambient noise.

In the embodiment of the present disclosure, the method provided in the embodiment of the present disclosure may obtain a noise signal capable of representing the noise of the local terminal by performing signal separation on the voice signal of the local terminal, and obtain the voice signal of the opposite end of the call by performing signal separation, so that the gain parameter applied during the amplification processing may be adaptively adjusted based on the actual situation of the noise signal and the voice signal, so that the local terminal user may listen to the voice signal with higher quality through the telephone receiver, and the definition of the voice signal may be improved, that is, the call quality is improved. Furthermore, the embodiment of the disclosure can also automatically determine the comfortable volume range of the user, that is, the target volume according to the hearing habits of the user, thereby realizing intelligent adjustment based on the target volume and further improving the hearing experience of the user.

For the above process, a further understanding can be provided by a brief schematic, referring to fig. 3, the terminal can include an antenna module, an application processor, an audio CODEC module (e.g., CODEC, microphone module, receiver module, and power management module) as shown in fig. 3, and of course, the terminal can also include other components, which are not limited in this disclosure, for the terminal, the terminal can collect the local audio signal through the MIC module, the collected audio signal includes a noise signal and a human voice signal, the antenna module transmits the collected audio signal to the application processor, the application processor transmits the collected audio signal to the audio CODEC module for preliminary processing, the audio CODEC module performs the signal separation process of step 201 to obtain the noise signal of the environment where the local terminal is located, the terminal can receive the sound signal from the opposite communication terminal through the antenna module, the received sound signal also comprises the voice signal and the noise signal of the opposite communication terminal, the antenna module can transmit the received sound signal to the application processor, the application processor transmits the sound signal to the audio coding and decoding module to perform signal separation so as to obtain the voice signal of the opposite communication terminal, and then the gain parameter is determined based on the target volume, the noise signal of the local terminal and the like, so that the voice signal of the opposite communication terminal is continuously processed, the processed voice signal is transmitted to the receiver to be played, and the whole process of self-adaptive noise reduction is completed. It should be noted that the terminal may include more than two microphones, one of the microphones may be a main microphone, the other microphones may be noise reduction microphones, the main microphone may be located near the bottom of the terminal, and the other microphones may be located near the top of the terminal, or may be distributed in other locations.

In the above process, the first terminal may filter the noise signal after performing signal separation on the first sound signal, and only send the voice signal of the first sound signal to the second terminal, thereby improving the definition of the voice in the call process. And for the second sound signal, the first terminal can directly filter the noise signal obtained by separation after the first terminal performs signal separation on the second sound signal so as to obtain the human sound signal with high definition.

Fig. 4 is a block diagram illustrating an acoustic signal processing apparatus according to an exemplary embodiment. Referring to fig. 4, the apparatus includes a signal separation module 401, a parameter determination module 402, and an amplification module 403.

A signal separation module 401, configured to perform signal separation on a first sound signal collected by a microphone of a first terminal during a call with a second terminal, so as to obtain a noise signal of the first sound signal;

the signal separation module 401 is further configured to, when receiving a second sound signal sent by the second terminal, perform signal separation on the second sound signal to obtain a human voice signal of the second sound signal;

a parameter determination module 402 configured to determine a target gain parameter according to a noise signal of the first sound signal, a human sound signal of the second sound signal, and a target volume;

an amplifying module 403 configured to amplify the human voice signal of the second sound signal according to the target gain parameter.

In one possible implementation, the parameter determination module includes:

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

The device provided by the embodiment of the disclosure performs signal separation on the sound signal of the home terminal to obtain a noise signal capable of representing the ambient noise of the home terminal, and performs signal separation to obtain the voice signal of the opposite end of the call, so that the gain parameter applied during amplification processing can be adaptively adjusted based on the actual situation of the noise signal and the voice signal, so that a user at the home terminal can listen to the voice signal with higher quality through a telephone receiver, the definition of the voice signal can be improved, and the call quality is also improved.

Fig. 5 is a block diagram illustrating an acoustic signal processing apparatus 500 according to an exemplary embodiment. For example, the apparatus 500 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 5, the apparatus 500 may include one or more of the following components: processing component 502, memory 504, power component 506, multimedia component 508, audio component 510, interface to I/O (Input/Output) 512, sensor component 514, and communication component 516.

The processing component 502 generally controls overall operation of the device 500, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 502 may include one or more processors 520 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 502 can include one or more modules that facilitate interaction between the processing component 502 and other components. For example, the processing component 502 can include a multimedia module to facilitate interaction between the multimedia component 508 and the processing component 502.

The memory 504 is configured to store various types of data to support operations at the apparatus 500. Examples of such data include instructions for any application or method operating on device 500, contact data, phonebook data, messages, pictures, videos, and so forth. The Memory 504 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as an SRAM (Static Random Access Memory), an EEPROM (Electrically-Erasable Programmable Read-Only Memory), an EPROM (Erasable Programmable Read-Only Memory), a PROM (Programmable Read-Only Memory), a ROM (Read-Only Memory), a magnetic Memory, a flash Memory, a magnetic disk, or an optical disk.

The power supply component 506 provides power to the various components of the device 500. The power components 506 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 500.

The multimedia component 508 includes a screen that provides an output interface between the device 500 and the user. In some embodiments, the screen may include an LCD (Liquid Crystal Display) and a TP (Touch Panel). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 508 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 500 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 510 is configured to output and/or input audio signals. For example, the audio component 510 includes a Microphone (MIC) configured to receive external audio signals when the device 500 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 504 or transmitted via the communication component 516. In some embodiments, audio component 510 further includes a speaker for outputting audio signals.

The I/O interface 512 provides an interface between the processing component 502 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 514 includes one or more sensors for providing various aspects of status assessment for the device 500. For example, the sensor assembly 514 may detect an open/closed state of the device 500, the relative positioning of the components, such as a display and keypad of the apparatus 500, the sensor assembly 514 may also detect a change in position of the apparatus 500 or a component of the apparatus 500, the presence or absence of user contact with the apparatus 500, orientation or acceleration/deceleration of the apparatus 500, and a change in temperature of the apparatus 500. The sensor assembly 514 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 514 may also include a light sensor, such as a CMOS (Complementary Metal Oxide Semiconductor) or CCD (Charge-coupled Device) image sensor, for use in imaging applications. In some embodiments, the sensor assembly 514 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 516 is configured to facilitate communication between the apparatus 500 and other devices in a wired or wireless manner. The apparatus 500 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 516 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the Communication component 516 further includes a Near Field Communication (NFC) module to facilitate short-range communications.

In an exemplary embodiment, the apparatus 500 may be implemented by one or more ASICs (Application Specific Integrated circuits), DSPs (Digital signal processors), DSPDs (Digital signal processing devices), PLDs (Programmable Logic devices), FPGAs (Field Programmable Gate arrays), controllers, microcontrollers, microprocessors or other electronic components for performing the above-described sound signal processing methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 504 comprising instructions, executable by the processor 520 of the apparatus 500 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a RAM (Random Access Memory), a CD-ROM (Compact Disc Read-Only Memory), a magnetic tape, a floppy disk, an optical data storage device, and the like.

The disclosed embodiments also provide a computer-readable storage medium, in which instructions, when executed by a processor of a terminal, enable the terminal to perform a sound signal processing method, the method including: in the process of communication with a second terminal, performing signal separation on a first sound signal collected by a microphone of the first terminal to obtain a noise signal of the first sound signal; when a second sound signal sent by the second terminal is received, performing signal separation on the second sound signal to obtain a human voice signal of the second sound signal; determining a target gain parameter according to the noise signal of the first sound signal, the human sound signal of the second sound signal and the target volume; and amplifying the human voice signal of the second sound signal according to the target gain parameter.

The computer-readable storage medium provided by the embodiment of the disclosure performs signal separation on a voice signal of a home terminal to obtain a noise signal capable of representing a noise of an environment of the home terminal, and performs signal separation to obtain a voice signal of an opposite end of a call, so that a gain parameter applied during amplification processing can be adaptively adjusted based on actual conditions of the noise signal and the voice signal, and a user of the home terminal can listen to the voice signal with higher quality through a telephone receiver, thereby improving the definition of the voice signal, that is, improving the call quality.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method for processing a sound signal, the method being applied to a first terminal, the method comprising:

amplifying the human voice signal of the second sound signal according to the target gain parameter;

the determining a target gain parameter according to the noise signal of the first sound signal, the human sound signal of the second sound signal and the target volume comprises:

2. The method of claim 1, wherein the signal separation of the first sound signal collected by the microphone of the first terminal to obtain the noise signal of the first sound signal comprises:

3. The method of claim 1, wherein the determining a first volume difference between the human voice signal of the second sound signal and the noise signal of the first sound signal comprises:

4. The method of claim 1, wherein the determining a first volume difference between the human voice signal of the second sound signal and the noise signal of the first sound signal comprises:

5. The method according to any of claims 1 to 4, wherein the target volume is determined based on the hearing habits of the user of the first terminal.

6. The method of claim 5, wherein the determining of the target volume comprises:

7. The method of claim 1, wherein the first terminal comprises at least two microphones.

8. An apparatus for processing a sound signal, applied to a first terminal, the apparatus comprising:

an amplifying module configured to amplify the human voice signal of the second sound signal according to the target gain parameter;

wherein the parameter determination module comprises:

9. The apparatus of claim 8, wherein the signal separation module is configured to input the first sound signal into a signal separation model, and identify a noise signal in the first sound signal according to the sound characteristics of the human sound signal and the sound characteristics of the noise signal through the signal separation model;

10. The apparatus according to claim 8, wherein the first determining unit is configured to determine a first average volume of a noise signal of the first sound signal over a first time period; determining a second average volume of a human voice signal of the second sound signal over the first time period; taking a volume difference between the first average volume and the second average volume as the first volume difference.

11. The apparatus according to claim 8, wherein the first determination unit is configured to determine a maximum volume of a noise signal of the first sound signal in a second time period;

12. The apparatus according to any of claims 8 to 11, wherein the target volume is determined based on the hearing habits of the user of the first terminal.

13. The apparatus of claim 12, further comprising a target volume determination module configured to perform any of the following:

14. An acoustic signal processing apparatus, comprising:

one or more processors;

wherein the one or more processors are configured to execute the instructions to implement the sound signal processing method of any one of claims 1 to 7.

15. A computer-readable storage medium, wherein instructions, when executed by a processor of a first terminal, enable the first terminal to perform a sound signal processing method according to any one of claims 1 to 7.