CN107910013B - Voice signal output processing method and device - Google Patents

Voice signal output processing method and device Download PDF

Info

Publication number
CN107910013B
CN107910013B CN201711104384.1A CN201711104384A CN107910013B CN 107910013 B CN107910013 B CN 107910013B CN 201711104384 A CN201711104384 A CN 201711104384A CN 107910013 B CN107910013 B CN 107910013B
Authority
CN
China
Prior art keywords
signal
voice
amplitude
noise
background noise
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711104384.1A
Other languages
Chinese (zh)
Other versions
CN107910013A (en
Inventor
杨宗业
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Oppo Mobile Telecommunications Corp Ltd
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp Ltd filed Critical Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority to CN201711104384.1A priority Critical patent/CN107910013B/en
Publication of CN107910013A publication Critical patent/CN107910013A/en
Application granted granted Critical
Publication of CN107910013B publication Critical patent/CN107910013B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering

Abstract

The invention is suitable for the technical field of signal processing, and provides a method and a device for outputting and processing a voice signal, wherein the method comprises the following steps: recognizing a voice signal and a background noise signal from a sound signal acquired in real time; acquiring an amplitude difference value of the voice signal and the background noise signal; and performing noise reduction processing on the sound signal based on the amplitude difference value. The method and the device have the advantages that the processing of the voice signals is more targeted, and the problem that the signal to noise ratio is not high due to the fact that the volume is controlled through AGC and the processing is only carried out based on the height of the voice signals is solved.

Description

Voice signal output processing method and device
Technical Field
The invention belongs to the technical field of signal processing, and particularly relates to a method and a device for outputting and processing a voice signal.
Background
A user can use a mobile phone to carry out communication in a hands-free communication mode when driving, in the prior art, the voice signal of the mobile phone is processed by adaptively adjusting the Gain through Automatic Gain Control (AGC), the voice signal can be adaptively reduced under the condition of large voice signal, and the voice signal can be adaptively increased under the condition of small voice signal, so that the amplitude of the output voice signal is automatically kept to be changed in a small range. However, the noise is amplified while the sound signal is increased by the AGC, the transmitted speech noise ratio is poor, and the user experience of conversation is not good.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and an apparatus for processing a speech signal to solve the problem of poor signal-to-noise ratio caused by using AGC to adjust a speech signal in the prior art.
A first aspect of an embodiment of the present invention provides a method for processing an output of a speech signal, including:
recognizing a voice signal and a background noise signal from a sound signal acquired in real time;
acquiring an amplitude difference value of the voice signal and the background noise signal;
and performing noise reduction processing on the sound signal based on the amplitude difference value.
A second aspect of an embodiment of the present invention provides an apparatus for processing an output of a speech signal, including:
the audio signal acquisition unit is used for identifying a voice signal and a background noise signal from a sound signal acquired in real time;
the amplitude difference value calculation unit is used for acquiring the amplitude difference value of the voice signal and the background noise signal;
and the processing unit is used for carrying out noise reduction processing on the sound signal based on the amplitude difference value.
A third aspect of the present application provides a terminal device, comprising:
comprising a memory, a processor and a computer program stored in said memory and executable on said processor, characterized in that said processor implements the steps of the method for output processing of said speech signal as provided in the first aspect of the present application when executing said computer program.
A fourth aspect of the present application provides a computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the steps of the method for processing the output of a speech signal provided in the first aspect of the present application.
A fifth aspect of the present application provides a computer program product comprising a computer program which, when executed by one or more processors, performs the steps of the method of output processing of a speech signal as provided by the first aspect of the present application.
Compared with the prior art, the embodiment of the invention has the following beneficial effects: the method comprises the steps of recognizing a voice signal and a background noise signal from a sound signal acquired in real time; acquiring an amplitude difference value of the voice signal and the background noise signal; and performing noise reduction processing on the sound signal based on the amplitude difference value. The method and the device have the advantages that the processing of the voice signals is more targeted, and the problem that the signal to noise ratio is not high due to the fact that the volume is controlled through AGC and the processing is only carried out based on the height of the voice signals is solved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a schematic flow chart illustrating an implementation of a method for processing an output of a speech signal according to an embodiment of the present invention;
fig. 2 is a schematic flow chart illustrating an implementation of a speech signal output processing method according to a second embodiment of the present invention;
fig. 3 is a schematic diagram of an output processing apparatus for a speech signal according to a third embodiment of the present invention;
fig. 4 is a schematic diagram of a terminal device according to a fourth embodiment of the present invention.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.
In order to explain the technical means of the present invention, the following description will be given by way of specific examples.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
In order to explain the technical means of the present invention, the following description will be given by way of specific examples.
Example one
Fig. 1 is a schematic flow chart of an implementation process of a voice denoising method according to an embodiment of the present invention, where the voice denoising method according to an embodiment of the present invention may be applied to an electronic device with a voice receiving element, such as a mobile phone, a notebook, a tablet computer, a vehicle-mounted system, and a wearable electronic device, and as shown in the figure, the method may include the following steps:
step S101 identifies a speech signal and a background noise signal from the sound signal acquired in real time.
In this embodiment, a user can start the hands-free mode to communicate with the mobile phone during driving, because the mobile phone is generally placed on a mobile phone holder of the automobile in the hands-free mode, and because the mobile phone is far away from the user, the mobile phone receives a voice signal of the user and also receives background noise generated during the operation of the automobile, such as tire noise generated by friction between tires and a road surface, air conditioning noise generated by an air conditioning fan, and wind noise generated by friction between air and gaps and corners of the automobile during the driving of the automobile. The background noise belongs to steady-state noise with small signal amplitude change and high repetition frequency. The noise generally persists during the user's conversation, and the voice apparatus may significantly affect the quality of the voice signal when performing AGC adjustment on the voice signal.
In this embodiment, first, the signal needs to be identified according to the signal characteristics of the speech signal and the background noise signal. The speech and the background noise can be recognized by storing a human voice model and a noise model in advance. The model contains speech characteristics of the sound such as frequency, zero crossing rate, short-term average energy, short-term average amplitude, etc. For example, after sampling the sound signal, the sound signal is matched with the speech model, if the sound signal includes all features in the human sound model, that is, it indicates that a person is speaking at present, if the sound signal cannot match the human sound model, it may be that the sound of the user is too small, or the background noise is too large, that the sound cannot be recognized from the currently acquired sound, and at this time, the mobile phone may issue an error prompt to the user, for example, the mobile phone may issue a prompt message of "the sound of the user cannot be acquired". Similarly, the acquired sound signal may be identified according to a pre-stored tire noise model, air-conditioning noise model, or wind noise model, so that the type of noise contained in the current sound signal can be determined.
Step S102, obtaining the amplitude difference value of the voice signal and the background noise signal;
in this embodiment, after recognizing that the sound signal contains the human voice and the background noise according to the human voice model in step S101, the human voice and the background noise may be extracted according to the features of the above models, and the average amplitude of the voice signal representing the human voice and the average amplitude of the background noise signal in the sound signal collected over a period of time may be calculated according to the recorded sound waveform.
And step S103, performing noise reduction processing on the sound signal based on the amplitude difference value.
In the present embodiment, the absolute value of the speech signal amplitude and the background noise signal amplitude acquired in step S102 is first obtained, and then the difference between the absolute value of the speech signal amplitude and the absolute value of the background noise signal amplitude is calculated. According to the embodiment, the preset processing methods are respectively selected according to the difference, so that the processing of the voice signals is more targeted, and the problem of low signal-to-noise ratio caused by processing based on the voice signals only through AGC is solved.
Optionally, before recognizing the speech signal and the background noise signal, the method includes: pre-filtering the sound signal.
In this embodiment, it is considered that the sound signal may be interfered by random noise, such as gaussian noise. If a sound signal with random noise is identified, an identification error may be generated due to interference of the random noise. Therefore, in the present embodiment, pre-filtering is performed before the sound signal is recognized. Since random noise in the environment is uncorrelated and exhibits high-frequency characteristics in the signal, the acquired sound signal can be low-pass filtered in the frequency domain, so that the high-frequency part obviously belonging to the noise signal in the sound signal is filtered, and the accuracy of subsequent human voice identification and noise identification is improved.
Example two
Based on the first embodiment, specifically, the performing noise reduction processing on the sound signal based on the amplitude difference includes: and if the amplitude difference value is a positive value and is greater than or equal to a first threshold value, performing noise reduction processing on the sound signal, wherein the intensity of the noise reduction processing is in direct proportion to the amplitude difference value.
In this embodiment, when the amplitude of the voice signal of the user is greater than the amplitude of the noise signal, the difference value is determined to be greater than the first threshold, and in this embodiment, the difference value is compared with the first threshold, and corresponding noise reduction methods are respectively adopted according to the comparison result.
When the difference is greater than the first threshold, i.e., the amplitude of the noise signal is significantly less than the amplitude of the speech signal. The method shows that the current speech signal quality is good and is not obviously influenced by noise, and the noise reduction processing can be performed on the sound signal by adopting a conventional noise reduction algorithm, such as amplitude spectrum subtraction, harmonic enhancement method and noise cancellation method which are commonly used in the field, and the noise reduction processing algorithm is not limited. For the specific use of the above noise reduction algorithm, reference may be made to the prior art, which is not described herein in detail. In the present embodiment, the intensity of the noise reduction processing is selected according to the magnitude of the difference. Generally speaking, the stronger the noise reduction process, the better the noise removal effect, but the speech signal emitted by a normal user will be severely distorted. Therefore, optionally, when the difference is larger, it indicates that the voice signal is less affected by noise, and aliasing between noise and voice is not serious, so that the intensity of the noise reduction processing can be increased, and when the difference is smaller, it indicates that the noise signal has a certain influence on the voice signal, i.e., a certain aliasing exists between noise and voice, so that the intensity of the noise reduction processing is correspondingly reduced. The intensity of noise reduction processing is dynamically adjusted according to the difference value between the voice signal amplitude and the noise signal amplitude, and the sound quality of the noise reduction processing is improved.
Optionally, as shown in fig. 2, if the amplitude difference is a positive value, and the amplitude difference is smaller than or equal to the second threshold, and the amplitude of the speech signal is greater than the third threshold, the method includes:
step S201, amplifying the sound signal by a preset gain to obtain a first intermediate signal.
Step S202, noise reduction processing is carried out on the first intermediate signal to obtain a second intermediate signal.
Step S203, attenuating the second intermediate signal according to the preset gain to obtain the noise-reduced sound signal.
In this embodiment, when the difference S is a positive value and is smaller than a second threshold T, it indicates that the current noise condition is relatively serious, where the second threshold T may be equal to the first threshold, or may also be smaller than the first threshold. If the noise reduction processing is directly performed on the current voice signal, the obtained voice signal is not ideal. Therefore, in this embodiment, the sound signal is first amplified according to a preset gain a, that is, both the speech signal and the noise signal in the sound signal are amplified according to the gain a, so as to obtain a first intermediate signal. It can be seen that the difference between the speech signal amplitude and the noise signal amplitude in the first intermediate signal is also amplified by the gain a, and therefore, the noise reduction processing performed after the sound signal is amplified by the gain a, the influence on the normal user speech signal can be significantly reduced, and the value of the gain a can be calculated according to the difference S and the second threshold T. Specifically, the method comprises the following steps: a is more than or equal to T/S. By the calculated gain a. The difference S' after the signal gain adjustment may be greater than the second threshold T, and then the first intermediate signal is subjected to noise reduction processing to obtain a second intermediate signal. And attenuating the second intermediate signal subjected to the noise reduction according to the gain A, and obtaining the sound signal subjected to the noise reduction after attenuation. In order to ensure the noise reduction effect, in this embodiment, the amplitude of the voice signal is greater than the third threshold. The third threshold value can be determined according to the performance of the mobile phone and the noise reduction algorithm used together. The gain a correspondingly cannot exceed a maximum threshold value, so that the amplitude value of the sound signal after the gain is not larger than the clipping point. Because the signal after gain amplification cannot be recorded if the signal after gain exceeds the clipping point, the original signal cannot be obtained even if the signal is attenuated by the same gain.
In this embodiment, when the amplitude difference is a positive value, the amplitude difference is smaller than or equal to the second threshold, and the amplitude of the voice signal is greater than the third threshold, the voice signal is subjected to noise reduction processing after being amplified by a preset gain, and then the voice signal subjected to noise reduction processing is attenuated according to the preset gain, so that not only is the signal size unchanged, but also the signal-to-noise ratio of the voice signal is improved, and the user experience is improved.
Optionally, the voice model includes a voice model of a specific user, and during driving, there may be other passengers speaking in addition to the user who is talking in the hands-free mode, so that when recognizing a voice signal using the voice model, the voice of the specific user is recognized according to the voice model of the specific user first. Whether a human voice other than the specific user is included is then identified according to the acoustic model. If there are voices of other than the specific user and the signal amplitude of the voices of other than the specific user is greater than the sound signal amplitude of the specific user, the speech of the specific user may not be enhanced by the noise reduction method in this embodiment. In this case, a prompt may be sent to the user, and the speaking voice of the current other person is too loud, which may affect the conversation effect.
Optionally, the noise reduction method disclosed above may be combined with a conventional Automatic Gain Control (AGC), specifically, after recognizing that the sound signal includes a user's sound signal and background noise, the obtained sound signal is subjected to AGC processing first, so that the amplitude of the sound signal changes in a small range, and then the amplitude of the sound signal and the amplitude of the background noise signal are obtained; calculating a difference value between the amplitude of the voice signal and the amplitude of the background noise signal, and when the difference value is a positive value and is greater than or equal to a first threshold value, performing noise reduction processing on the voice signal, wherein the intensity of the noise reduction processing is in direct proportion to the difference value; and when the difference value is a positive value and is smaller than a first threshold value, amplifying the sound signal by a preset gain, then carrying out noise reduction processing on the sound signal, and then attenuating the sound signal after the noise reduction processing according to the preset gain. By combining the conventional AGC method with the noise reduction method in the present embodiment, the voice noise reduction effect can be further optimized.
EXAMPLE III
Based on the first embodiment, specifically, the performing noise reduction processing on the sound signal based on the amplitude difference includes: and if the amplitude difference value is not a positive value, outputting prompt information, wherein the prompt information is used for prompting a user to approach a microphone for communication or increasing the speaking volume.
In this embodiment, when the difference S between the amplitude of the voice signal of the user and the amplitude of the noise signal is 0, that is, the voice signal and the noise signal are the same; or when the difference S is a negative value, that is, the amplitude of the speech signal is smaller than that of the noise signal, since the background noise is additive noise, the speech of the user is covered by the noise signal when the amplitude difference is a non-positive value, and in this case, performing noise reduction processing on the noise signal also processes the normal speech signal of the user, thereby causing serious distortion to the speech of the user. Therefore, in this embodiment, when the difference is a non-positive value, a prompt may be sent to the user through the audio output unit of the mobile phone, for example, to prompt the user to speak near the mobile phone or speak loudly when the sound is too loud or too loud.
Optionally, the content of the cell phone prompt may be adjusted by detecting the distance between the user and the cell phone.
In this embodiment, as in the first embodiment, the voice signal including the human voice and the noise signal are identified according to the voice model. And specifically identifying noise signals as tire noise, wind noise and air conditioner noise. If the difference value between the voice signal amplitude and the noise signal amplitude is a non-positive value and the distance between the user and the mobile phone is smaller than the preset value, the fact that the user is close to the mobile phone at present is indicated, and the problem that the call noise cannot be solved by speaking close to the mobile phone is solved, therefore, the source of the noise of the user can be simultaneously prompted while the voice of the user is prompted, so that the user can solve the problem in a targeted mode, for example, the fact that the air conditioner noise amplitude is larger than the voice signal amplitude of the user through voice recognition is found, the user is reminded that the air conditioner noise is too large, and the user is informed that the air conditioner wind speed can be reduced. If the wind noise amplitude is larger than the voice signal amplitude of the user through voice recognition, the user is prompted to have too large wind noise, and the user is advised to reduce the vehicle speed. The embodiment can not solve the noise problem by approaching the mobile phone to speak through the identified noise source, and can remind the user of the main source of the current noise so that the user can correspondingly adopt a solution, the user can conveniently and accurately solve the noise of voice reduction, and the user experience is improved.
Example four
Fig. 3 shows a constituent structure of the speech signal output processing apparatus provided in the present embodiment, and for convenience of explanation, only the portions related to the present embodiment are shown.
In this embodiment, the apparatus is used to implement the method for processing the output of the voice signal in the embodiment of fig. 1, and may be a software unit, a hardware unit or a unit combining software and hardware that is built in the mobile terminal. The mobile terminal includes but is not limited to a smart phone, a tablet computer, a learning machine or a smart car device.
As shown in fig. 3, the speech signal output processing apparatus 3 includes:
an audio signal acquisition unit 301 for recognizing a speech signal and a background noise signal from a sound signal acquired in real time;
an amplitude difference calculation unit 302, configured to obtain an amplitude difference between the speech signal and the background noise signal;
a processing unit 303, configured to perform noise reduction processing on the sound signal based on the amplitude difference.
Optionally, the apparatus for processing the output of the voice signal further includes:
and the prompting unit is used for outputting prompting information if the amplitude difference value is a non-positive value, wherein the prompting information is used for prompting a user to approach a microphone for conversation or increasing the speaking volume.
Optionally, the processing unit further includes:
the first processing subunit is configured to, if the amplitude difference is a positive value, and the amplitude difference is smaller than or equal to a second threshold, and the amplitude of the voice signal is greater than a third threshold, amplify the voice signal by a preset gain, so as to obtain a first intermediate signal;
carrying out noise reduction processing on the first intermediate signal to obtain a second intermediate signal;
and attenuating the second intermediate signal according to the preset gain to obtain the sound signal subjected to noise reduction processing.
Optionally, the processing unit further includes:
and the second processing subunit is used for performing noise reduction processing on the sound signal if the amplitude difference value is a positive value and is greater than or equal to a first threshold value, and the intensity of the noise reduction processing is in direct proportion to the amplitude difference value.
Optionally, the apparatus for processing an output of a voice signal further includes:
a pre-processing unit for pre-filtering the sound signal before recognizing the speech signal and the background noise signal.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
Fig. 4 is a schematic diagram of a terminal device according to an embodiment of the present invention. As shown in fig. 4, the terminal device 4 of this embodiment includes: a processor 40, a memory 41 and a computer program 42 stored in said memory 41 and executable on said processor 40. The processor 40, when executing the computer program 42, implements the steps in the above-described respective speech signal output processing method embodiments, such as the steps 101 to 103 shown in fig. 1. Alternatively, the processor 40, when executing the computer program 42, implements the functions of the units in the device embodiments described above, such as the functions of the units 301 to 303 shown in fig. 3.
Illustratively, the computer program 42 may be partitioned into one or more modules/units that are stored in the memory 41 and executed by the processor 40 to implement the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 42 in the terminal device 4. For example, the computer program 42 may be divided into a synchronization module, a summary module, an acquisition module, and a return module (a module in a virtual device), and each module has the following specific functions:
the terminal device 4 may be a computing device with a voice input/output function, such as a notebook, a palm computer, a mobile phone, a tablet computer, and a navigator. The terminal device may include, but is not limited to, a processor 40, a memory 41. Those skilled in the art will appreciate that fig. 4 is merely an example of a terminal device 4 and does not constitute a limitation of terminal device 4 and may include more or fewer components than shown, or some components may be combined, or different components, e.g., the terminal device may also include input-output devices, network access devices, buses, etc.
The Processor 40 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 41 may be an internal storage unit of the terminal device 4, such as a hard disk or a memory of the terminal device 4. The memory 41 may also be an external storage device of the terminal device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the terminal device 4. Further, the memory 41 may also include both an internal storage unit and an external storage device of the terminal device 4. The memory 41 is used for storing the computer program and other programs and data required by the terminal device. The memory 41 may also be used to temporarily store data that has been output or is to be output.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. . Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims (7)

1. A method for processing an output of a speech signal, comprising:
recognizing a speech signal and a background noise signal from a real-time acquired sound signal, comprising: recognizing the voice and the background noise by pre-storing a human voice model and a noise model, and extracting the human voice and the background noise according to the characteristics of the models; wherein, the human voice model contains the voice characteristics of human voice, including: frequency, zero crossing rate, short-term average energy, short-term average amplitude;
acquiring the amplitude difference value of the voice signal and the background noise signal, including: calculating the average amplitude of a voice signal representing human voice and the average amplitude of a background noise signal in the collected voice signals in a period of time according to the recorded voice waveform, calculating the absolute value of the obtained average amplitude of the voice signal and the average amplitude of the background noise signal, and then calculating the difference value between the absolute value of the average amplitude of the voice signal and the absolute value of the average amplitude of the background noise signal;
performing noise reduction processing on the sound signal based on the amplitude difference value, including:
if the amplitude difference S is a positive value and is greater than or equal to a first threshold, performing noise reduction processing on the sound signal, wherein the intensity of the noise reduction processing is in direct proportion to the amplitude difference;
if the amplitude difference value is a positive value, the amplitude difference value is less than or equal to a second threshold value T, and the amplitude of the voice signal is greater than a third threshold value,
amplifying the sound signal by a preset gain A to obtain a first intermediate signal; wherein A is more than or equal to T/S;
carrying out noise reduction processing on the first intermediate signal to obtain a second intermediate signal;
and attenuating the second intermediate signal according to the preset gain to obtain the sound signal subjected to noise reduction processing.
2. The method for processing an output of a speech signal according to claim 1, wherein said performing noise reduction processing on the sound signal based on the amplitude difference value comprises:
and if the amplitude difference value is not a positive value, outputting prompt information, wherein the prompt information is used for prompting a user to approach a microphone for communication or increasing the speaking volume.
3. The method of output processing of a speech signal according to claim 1, before recognizing the speech signal and the background noise signal, comprising:
pre-filtering the sound signal.
4. An output processing apparatus for a speech signal, comprising:
an audio signal acquisition unit for recognizing a speech signal and a background noise signal from a sound signal acquired in real time, comprising: recognizing the voice and the background noise by pre-storing a human voice model and a noise model, and extracting the human voice and the background noise according to the characteristics of the models; wherein, the human voice model contains the voice characteristics of human voice, including: frequency, zero crossing rate, short-term average energy, short-term average amplitude;
the amplitude difference calculation unit is configured to obtain an amplitude difference between the speech signal and the background noise signal, and includes: calculating the average amplitude of a voice signal representing human voice and the average amplitude of a background noise signal in the collected voice signals in a period of time according to the recorded voice waveform, calculating the absolute value of the obtained average amplitude of the voice signal and the average amplitude of the background noise signal, and then calculating the difference value between the absolute value of the average amplitude of the voice signal and the absolute value of the average amplitude of the background noise signal;
a processing unit, configured to perform noise reduction processing on the sound signal based on the amplitude difference, including:
if the amplitude difference S is a positive value and is greater than or equal to a first threshold, performing noise reduction processing on the sound signal, wherein the intensity of the noise reduction processing is in direct proportion to the amplitude difference;
if the amplitude difference value is a positive value, the amplitude difference value is less than or equal to a second threshold value T, and the amplitude of the voice signal is greater than a third threshold value,
amplifying the sound signal by a preset gain A to obtain a first intermediate signal; wherein A is more than or equal to T/S;
carrying out noise reduction processing on the first intermediate signal to obtain a second intermediate signal;
and attenuating the second intermediate signal according to the preset gain to obtain the sound signal subjected to noise reduction processing.
5. The apparatus for output processing of a speech signal according to claim 4, wherein said processing unit comprises:
and the prompting unit is used for outputting prompting information if the amplitude difference value is a non-positive value, wherein the prompting information is used for prompting a user to approach a microphone for conversation or increasing the speaking volume.
6. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 3 when executing the computer program.
7. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 3.
CN201711104384.1A 2017-11-10 2017-11-10 Voice signal output processing method and device Active CN107910013B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711104384.1A CN107910013B (en) 2017-11-10 2017-11-10 Voice signal output processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711104384.1A CN107910013B (en) 2017-11-10 2017-11-10 Voice signal output processing method and device

Publications (2)

Publication Number Publication Date
CN107910013A CN107910013A (en) 2018-04-13
CN107910013B true CN107910013B (en) 2021-09-24

Family

ID=61844674

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711104384.1A Active CN107910013B (en) 2017-11-10 2017-11-10 Voice signal output processing method and device

Country Status (1)

Country Link
CN (1) CN107910013B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108831500B (en) * 2018-05-29 2023-04-28 平安科技(深圳)有限公司 Speech enhancement method, device, computer equipment and storage medium
CN109102800A (en) * 2018-07-26 2018-12-28 广州酷狗计算机科技有限公司 A kind of method and apparatus that the determining lyrics show data
CN110164423B (en) 2018-08-06 2023-01-20 腾讯科技(深圳)有限公司 Azimuth angle estimation method, azimuth angle estimation equipment and storage medium
CN109637543A (en) * 2018-12-12 2019-04-16 平安科技(深圳)有限公司 The voice data processing method and device of sound card
CN111383647B (en) * 2018-12-28 2022-10-25 展讯通信(上海)有限公司 Voice signal processing method and device and readable storage medium
CN109639904B (en) * 2019-01-25 2021-02-02 努比亚技术有限公司 Mobile phone mode adjusting method, system and computer storage medium
CN111768794A (en) * 2019-03-15 2020-10-13 上海博泰悦臻网络技术服务有限公司 Voice noise reduction method, voice noise reduction system, equipment and storage medium
CN111796790B (en) * 2019-04-09 2023-09-08 深圳市冠旭电子股份有限公司 Sound effect adjusting method and device, readable storage medium and terminal equipment
CN110097884B (en) * 2019-06-11 2022-05-17 大众问问(北京)信息科技有限公司 Voice interaction method and device
CN112911441A (en) * 2021-01-18 2021-06-04 上海闻泰信息技术有限公司 Noise reduction method, apparatus, audio device, and computer-readable storage medium
CN116168719A (en) * 2022-12-26 2023-05-26 杭州爱听科技有限公司 Sound gain adjusting method and system based on context analysis

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101272131A (en) * 2007-03-13 2008-09-24 瑞昱半导体股份有限公司 Programmable gain amplifier with noise cancellation
CN101976566A (en) * 2010-07-09 2011-02-16 瑞声声学科技(深圳)有限公司 Voice enhancement method and device using same
US8321215B2 (en) * 2009-11-23 2012-11-27 Cambridge Silicon Radio Limited Method and apparatus for improving intelligibility of audible speech represented by a speech signal
US8364477B2 (en) * 2005-05-25 2013-01-29 Motorola Mobility Llc Method and apparatus for increasing speech intelligibility in noisy environments
CN104064185A (en) * 2013-03-18 2014-09-24 联想(北京)有限公司 Information processing method and system and electronic device
CN104103278A (en) * 2013-04-02 2014-10-15 北京千橡网景科技发展有限公司 Real time voice denoising method and device
CN105845151A (en) * 2016-05-30 2016-08-10 百度在线网络技术(北京)有限公司 Audio gain adjustment method and audio gain adjustment device applied to speech recognition front-end
CN106782586A (en) * 2016-11-14 2017-05-31 阔地教育科技有限公司 A kind of acoustic signal processing method and device
CN107092461A (en) * 2017-06-01 2017-08-25 深圳天珑无线科技有限公司 The way of recording, device and computer-readable recording medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7099821B2 (en) * 2003-09-12 2006-08-29 Softmax, Inc. Separation of target acoustic signals in a multi-transducer arrangement
US8949120B1 (en) * 2006-05-25 2015-02-03 Audience, Inc. Adaptive noise cancelation
CN101859568B (en) * 2009-04-10 2012-05-30 比亚迪股份有限公司 Method and device for eliminating voice background noise
CN104376848B (en) * 2013-08-12 2018-03-23 展讯通信(上海)有限公司 Audio signal processing method and device
CN104810024A (en) * 2014-01-28 2015-07-29 上海力声特医学科技有限公司 Double-path microphone speech noise reduction treatment method and system
CN106898360B (en) * 2017-04-06 2023-08-08 北京地平线信息技术有限公司 Audio signal processing method and device and electronic equipment

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8364477B2 (en) * 2005-05-25 2013-01-29 Motorola Mobility Llc Method and apparatus for increasing speech intelligibility in noisy environments
CN101272131A (en) * 2007-03-13 2008-09-24 瑞昱半导体股份有限公司 Programmable gain amplifier with noise cancellation
US8321215B2 (en) * 2009-11-23 2012-11-27 Cambridge Silicon Radio Limited Method and apparatus for improving intelligibility of audible speech represented by a speech signal
CN101976566A (en) * 2010-07-09 2011-02-16 瑞声声学科技(深圳)有限公司 Voice enhancement method and device using same
CN104064185A (en) * 2013-03-18 2014-09-24 联想(北京)有限公司 Information processing method and system and electronic device
CN104103278A (en) * 2013-04-02 2014-10-15 北京千橡网景科技发展有限公司 Real time voice denoising method and device
CN105845151A (en) * 2016-05-30 2016-08-10 百度在线网络技术(北京)有限公司 Audio gain adjustment method and audio gain adjustment device applied to speech recognition front-end
CN106782586A (en) * 2016-11-14 2017-05-31 阔地教育科技有限公司 A kind of acoustic signal processing method and device
CN107092461A (en) * 2017-06-01 2017-08-25 深圳天珑无线科技有限公司 The way of recording, device and computer-readable recording medium

Also Published As

Publication number Publication date
CN107910013A (en) 2018-04-13

Similar Documents

Publication Publication Date Title
CN107910013B (en) Voice signal output processing method and device
CN110459234B (en) Vehicle-mounted voice recognition method and system
EP3698360A1 (en) Noise reduction using machine learning
US20140114665A1 (en) Keyword voice activation in vehicles
CN108335694B (en) Far-field environment noise processing method, device, equipment and storage medium
US10553236B1 (en) Multichannel noise cancellation using frequency domain spectrum masking
CN108305637B (en) Earphone voice processing method, terminal equipment and storage medium
CN105810203B (en) Apparatus and method for eliminating noise, voice recognition apparatus and vehicle equipped with the same
CN110556125B (en) Feature extraction method and device based on voice signal and computer storage medium
CN106251856A (en) A kind of environment noise based on mobile terminal eliminates system and method
US20140244245A1 (en) Method for soundproofing an audio signal by an algorithm with a variable spectral gain and a dynamically modulatable hardness
CN104505099A (en) Method and equipment for removing known interference in voice signal
CN110970051A (en) Voice data acquisition method, terminal and readable storage medium
EP2752848A1 (en) Method and apparatus for generating a noise reduced audio signal using a microphone array
CN110503973B (en) Audio signal transient noise suppression method, system and storage medium
CN109360578B (en) Echo cancellation method of audio device, audio device and readable storage medium
CN104867498A (en) Mobile communication terminal and voice enhancement method and module thereof
US20220301574A1 (en) Systems, methods, apparatus, and storage medium for processing a signal
CN113362845B (en) Method, apparatus, device, storage medium and program product for noise reduction of sound data
US20230320903A1 (en) Ear-worn device and reproduction method
CN108615534A (en) Far field voice de-noising method and system, terminal and computer readable storage medium
CN114302286A (en) Method, device and equipment for reducing noise of call voice and storage medium
CN111048096B (en) Voice signal processing method and device and terminal
CN114255779A (en) Audio noise reduction method for VR device, electronic device and storage medium
CN112312258B (en) Intelligent earphone with hearing protection and hearing compensation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: Changan town in Guangdong province Dongguan 523860 usha Beach Road No. 18

Applicant after: OPPO Guangdong Mobile Communications Co.,Ltd.

Address before: Changan town in Guangdong province Dongguan 523860 usha Beach Road No. 18

Applicant before: GUANGDONG OPPO MOBILE TELECOMMUNICATIONS Corp.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant