CN108766456A - A kind of method of speech processing and device - Google Patents

A kind of method of speech processing and device Download PDF

Info

Publication number
CN108766456A
CN108766456A CN201810496822.1A CN201810496822A CN108766456A CN 108766456 A CN108766456 A CN 108766456A CN 201810496822 A CN201810496822 A CN 201810496822A CN 108766456 A CN108766456 A CN 108766456A
Authority
CN
China
Prior art keywords
signal
way
echo
near end
output signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810496822.1A
Other languages
Chinese (zh)
Other versions
CN108766456B (en
Inventor
周舒然
李志飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Volkswagen China Investment Co Ltd
Mobvoi Innovation Technology Co Ltd
Original Assignee
Chumen Wenwen Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chumen Wenwen Information Technology Co Ltd filed Critical Chumen Wenwen Information Technology Co Ltd
Priority to CN201810496822.1A priority Critical patent/CN108766456B/en
Publication of CN108766456A publication Critical patent/CN108766456A/en
Priority to PCT/CN2019/087301 priority patent/WO2019223603A1/en
Application granted granted Critical
Publication of CN108766456B publication Critical patent/CN108766456B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)
  • Telephone Function (AREA)

Abstract

The present invention provides a kind of method of speech processing and device, this method includes:At least near end signal all the way is obtained by microphone array;Near end signal carries out echo cancellation process all the way to described at least, obtains at least residual echo signal all the way;Respectively at least near end signal and at least residual echo signal progress beam forming processing all the way all the way;To beam forming treated at least all the way near end signal and it is described at least all the way residual echo signal carry out nonlinear echo inhibition processing, obtain nonlinear echo inhibit output signal;Output signal is inhibited to carry out noise reduction and gain process the nonlinear echo.Therefore, scheme provided by the invention can improve signal-to-noise ratio.

Description

A kind of method of speech processing and device
Technical field
The present embodiments relate to voice processing technology fields, more particularly to a kind of method of speech processing and device.
Background technology
Intelligent sound technology is applied more and more extensive at present, and each intelligent sound equipment can utilize intelligent sound technology It is interacted with user.The voice signal that intelligent sound equipment receives may include near end signal and reference signal.Voice is whole The reference signal received is terminated after loud speaker sounding, which can form echo.
Currently, in order to reduce echo speech processes would generally be carried out to echo.But reducing during speech processes While residual echo, it will usually near-end speech be caused to be distorted.So that sound sounds not gentle, it is more ear-piercing.As it can be seen that existing Mode, the process of speech processes causes noise relatively low.
Invention content
In view of this, the embodiment of the present invention proposes a kind of method of speech processing and device, main purpose are to carry High s/n ratio.
In a first aspect, an embodiment of the present invention provides a kind of method of speech processing, which includes:
At least near end signal all the way is obtained by microphone array;
Near end signal carries out echo cancellation process all the way to described at least, obtains at least residual echo signal all the way;
Respectively to it is described at least all the way near end signal and it is described at least all the way residual echo signal carry out beam forming at Reason;
To beam forming treated at least all the way near end signal and it is described at least all the way residual echo signal carry out it is non- Linear echo inhibition is handled, and is obtained nonlinear echo and is inhibited output signal;
Output signal is inhibited to carry out noise reduction and gain process the nonlinear echo.
Optionally,
It is described to inhibit output signal to carry out noise reduction and gain process the nonlinear echo, including:
Determine that the nonlinear echo inhibits the signal-to-noise ratio of output signal;
By formula (1), determine that the nonlinear echo inhibits the corresponding noise reduced output signal of output signal;
Wherein, the T characterizes the noise reduced output signal;The P characterizes the nonlinear echo and inhibits output signal;Institute It states S and characterizes the signal-to-noise ratio.
Optionally,
It is described to inhibit output signal to carry out noise reduction and gain process the nonlinear echo, including:
Determine the corresponding at least one frequency point of the noise reduced output signal;
According to formula (2), corresponding first yield value of each described frequency point is determined;
Wherein, the NiCharacterize corresponding first yield value of i-th of frequency point;The HiCharacterize i-th of frequency point pair The peak value answered;The K1 characterizes first constant;The K2 characterizes second constant;
Using corresponding first yield value of frequency point described in each, the corresponding increasing of each described frequency point is determined Beneficial output signal.
Optionally,
Described near end signal carries out echo cancellation process all the way to described at least, obtains at least residual echo signal all the way, Including:
Reference signal is filtered using preset at least one filter, obtains estimated echo signal;
It is performed both by for per the near end signal described all the way:The estimated echo signal is eliminated in the near end signal, Obtain the corresponding residual echo signal of the near end signal.
Optionally,
It is described respectively to it is described at least all the way near end signal and it is described at least all the way residual echo signal carry out wave beam at Shape processing, including:
To at least near end signal and described at least residual echo signal carries out time delay adjustment respectively all the way all the way;
At least near end signal carries out beam forming all the way described in after being adjusted to time delay, obtains beam forming proximal end letter all the way Number;
At least residual echo signal carries out beam forming all the way described in after being adjusted to time delay, and it is residual to obtain beam forming all the way Remaining echo signal.
Optionally,
It is described that at least near end signal all the way is obtained by microphone array, including:
Near-end speech is oriented using the microphone array;
Beam forming is carried out to the near-end speech after orientation;
At least near end signal all the way described in being obtained from the near-end speech after beam forming.
Second aspect, an embodiment of the present invention provides a kind of voice processing apparatus, which includes:
Acquisition module, for passing through microphone array acquisition at least near end signal all the way;
Echo cancellation module, for it is described at least all the way near end signal carry out echo cancellation process, obtain at least all the way Residual echo signal;
Beamforming block, for respectively to it is described at least all the way near end signal and it is described at least all the way residual echo believe Number carry out beam forming processing;
Nonlinear echo suppression module, for beam forming treated at least all the way near end signal and it is described at least Residual echo signal carries out nonlinear echo inhibition processing all the way, obtains nonlinear echo and inhibits output signal;
Processing module, for inhibiting output signal to carry out noise reduction and gain process the nonlinear echo.
Optionally,
The processing module, including:Noise reduction submodule;
The noise reduction submodule, for determining that the nonlinear echo inhibits the signal-to-noise ratio of output signal;Pass through formula (1), determine that the nonlinear echo inhibits the corresponding noise reduced output signal of output signal;
Wherein, the T characterizes the noise reduced output signal;The P characterizes the nonlinear echo and inhibits output signal;Institute It states S and characterizes the signal-to-noise ratio.
Optionally,
The processing module, including:Gain submodule;
The gain submodule, for determining the corresponding at least one frequency point of the noise reduced output signal;According to formula (2), corresponding first yield value of each described frequency point is determined;Utilize frequency point described in each corresponding first Yield value determines the corresponding gain output signal of each described frequency point.
Wherein, the NiCharacterize corresponding first yield value of i-th of frequency point;The HiCharacterize i-th of frequency point pair The peak value answered;The K1 characterizes first constant;The K2 characterizes second constant.
The third aspect, an embodiment of the present invention provides a kind of storage medium, the storage medium includes the program of storage, In, equipment where the storage medium is controlled when described program is run execute it is any one of above-mentioned described in speech processes side Method.
Fourth aspect, an embodiment of the present invention provides a kind of electronic equipment, the electronic equipment includes processor, storage Device and bus;The processor, the memory complete mutual communication by the bus;The processor is for calling Program instruction in the memory, with execute it is any one of above-mentioned described in method of speech processing.
An embodiment of the present invention provides a kind of method of speech processing and devices, one or more is received by microphone array Near end signal, the roads Bing Duige near end signal carry out echo cancellation process, obtain one or more residual echo signal.Then distinguish Beam forming processing carried out to each road near end signal and each road residual echo signal, and treated that each road is close to beam forming End signal and each road residual echo signal carry out nonlinear echo inhibition processing, obtain nonlinear echo and inhibit output signal. Finally output signal is inhibited to carry out noise reduction and gain process nonlinear echo.By above-mentioned it is found that being obtained in microphone array When near end signal, near end signal carried out echo cancellation process, beam forming processing, nonlinear echo inhibit processing and Noise reduction and gain process.While inhibiting echo, to greatest extent so that sound is undistorted.Therefore, the embodiment of the present invention carries The scheme of confession can improve signal-to-noise ratio.
Above description is only the general introduction of technical solution of the present invention, in order to better understand the technical means of the present invention, And can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can It is clearer and more comprehensible, below the special specific implementation mode for lifting the present invention.
Description of the drawings
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is the present invention Some embodiments for those of ordinary skill in the art without creative efforts, can also basis These attached drawings obtain other attached drawings.
Fig. 1 shows a kind of flow chart of method of speech processing provided by one embodiment of the present invention;
Fig. 2 shows a kind of flow charts for method of speech processing that another embodiment of the present invention provides;
Fig. 3 shows a kind of structural schematic diagram of voice processing apparatus provided by one embodiment of the present invention;
Fig. 4 shows a kind of structural schematic diagram for voice processing apparatus that another embodiment of the present invention provides;
Fig. 5 has gone out the structural schematic diagram of a kind of electronic equipment provided by one embodiment of the present invention.
Specific implementation mode
It is described more fully the exemplary embodiment of the disclosure below with reference to accompanying drawings.Although showing this public affairs in attached drawing The exemplary embodiment opened, it being understood, however, that may be realized in various forms the disclosure without the implementation that should be illustrated here Example is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the model of the disclosure It encloses and is completely communicated to those skilled in the art.
As shown in Figure 1, an embodiment of the present invention provides a kind of method of speech processing, which includes:
Step 101:At least near end signal all the way is obtained by microphone array;
Step 102:To it is described at least all the way near end signal carry out echo cancellation process, obtain at least all the way residual echo believe Number;
Step 103:Respectively to it is described at least all the way near end signal and it is described at least all the way residual echo signal into traveling wave Beam shaping processing;
Step 104:To beam forming treated at least all the way near end signal and it is described at least all the way residual echo believe Number carry out nonlinear echo inhibition processing, obtain nonlinear echo inhibit output signal;
Step 105:Output signal is inhibited to carry out noise reduction and gain process the nonlinear echo.
Embodiment according to figure 1 receives one or more near end signal, the roads Bing Duige proximal end by microphone array Signal carries out echo cancellation process, obtains one or more residual echo signal.Then respectively to each road near end signal and each Road residual echo signal carries out beam forming processing, and is returned to beam forming treated each road near end signal and each road remnants Acoustical signal carries out nonlinear echo inhibition processing, obtains nonlinear echo and inhibits output signal.Finally nonlinear echo is inhibited Output signal carries out noise reduction and gain process.By above-mentioned it is found that when microphone array gets near end signal, proximal end is believed Echo cancellation process number is carried out, beam forming processing, nonlinear echo inhibit processing and noise reduction and gain process.Inhibiting While echo, to greatest extent so that sound is undistorted.Therefore, scheme provided in an embodiment of the present invention can improve noise Than.
In an embodiment of the invention, the step 101 in flow chart shown in above-mentioned Fig. 1 by microphone array obtain to Lack near end signal all the way, may include:
Near-end speech is oriented using the microphone array;
Beam forming is carried out to the near-end speech after orientation;
At least near end signal all the way described in being obtained from the near-end speech after beam forming.
In the present embodiment, microphone array include the quantity of microphone and pattern can be true according to business need It is fixed.For example, including 4 microphones in microphone array.
In the present embodiment, each microphone that microphone array includes can be that omnidirectional quiets down.In order to better Near end signal is got, needs to be oriented near-end speech when there are near-end speech.And according to the near-end speech after orientation Beam forming is carried out, to carry out gain to determining rearwardly direction, the voice in the corresponding direction of non-directional is inhibited.And according to Near-end speech after beam forming gets near end signal using each microphone.
In the present embodiment, when including the roads N microphone in microphone array, then the roads N near end signal is got.
According to above-described embodiment, when there are near-end speech, near-end speech is oriented first with microphone array, And beam forming is carried out to the near-end speech after orientation, and one or more proximal end is obtained from the near-end speech after beam forming Signal.Due to being had gain and inhibition for direction to near-end voice signals using beam forming, what is got is close Noise in end signal is relatively low.
In an embodiment of the invention, the step 102 in flow chart shown in above-mentioned Fig. 1 to it is described at least all the way proximal end believe Number echo cancellation process is carried out, obtains at least residual echo signal all the way, may include:
Reference signal is filtered using preset at least one filter, obtains estimated echo signal;
It is executed for per the near end signal described all the way:The estimated echo letter is eliminated in the near end signal Number, obtain the corresponding residual echo signal of the near end signal.
In the present embodiment, the pattern and quantity of filter can be determined according to business need.It it is one in filter When, filter can be filtered reference signal using the filtering method of itself, and obtain estimated echo signal.It is filtering When wave device is two or more, each filter is filtered reference signal, and is determined according to filter result excellent Selecting filter obtains estimated echo signal using filter preferably.
In the present embodiment, the determination method of the corresponding residual echo signal of any road near end signal all can be:Close Estimated echo signal is removed in end signal, can obtain residual echo signal.
According to above-described embodiment, it is filtered to obtain estimated echo signal to reference signal using filter, and Estimated echo signal is eliminated respectively in per near end signal all the way to obtain per the corresponding residual echo of near end signal all the way Signal.Due to having eliminated estimated echo signal in each road near end signal, echo can be reduced.
In an embodiment of the invention, it is eliminated in step near end signal involved in upper one embodiment described Estimated echo signal may further include after obtaining the corresponding residual echo signal of the near end signal:
Determine corresponding second yield value of the near end signal;
Echo compression processing is carried out to the residual echo signal using second yield value.
In the present embodiment, the method for determining corresponding second yield value of near end signal can be:With near end signal all the way For illustrate, determine the corresponding at least one frequency point of the road near end signal, and according to the near end signal and remnants on the road Echo signal determines the corresponding at least one related coefficient of at least one frequency point.It is performed both by for each frequency point:Determine frequency The frequency point yield value of point is the corresponding related coefficient of frequency point.Overload processing is carried out to obtained frequency point yield value and smoothing processing obtains To the second yield value.Then compression processing will be carried out to residual echo signal using the second yield value, it is compressed obtains echo Echo compression output.
According to above-described embodiment, residual echo signal is returned since near end signal corresponding second yield value is utilized Acoustic compression processing.Therefore, echo can be reduced to greatest extent.
In an embodiment of the invention, the step 103 in flow chart shown in above-mentioned Fig. 1 is respectively to described at least close all the way End signal and at least residual echo signal progress beam forming processing all the way may include:
To at least near end signal and described at least residual echo signal carries out time delay adjustment respectively all the way all the way;
At least near end signal carries out beam forming all the way described in after being adjusted to time delay, obtains beam forming proximal end letter all the way Number;
At least residual echo signal carries out beam forming all the way described in after being adjusted to time delay, and it is residual to obtain beam forming all the way Remaining echo signal.
In the present embodiment, since each road near end signal obtains time difference, it is therefore desirable to distinguish each road near end signal Time delay adjustment is carried out, to unify the time of each road near end signal respectively.Wave beam is carried out to the roads time delay adjustment Hou Ge near end signal Forming, obtains beam forming near end signal all the way.Obtained beam forming near end signal is relatively sharp.
In the present embodiment, since each road residual echo signal obtains time difference, it is therefore desirable to each road residual echo Signal carries out time delay adjustment respectively, to unify the time of each road residual echo signal respectively.It is remaining to the roads time delay adjustment Hou Ge Echo signal carries out beam forming, obtains beam forming residual echo signal all the way.Obtained beam forming residual echo signal It is relatively sharp.
According to above-described embodiment, time delay adjustment is carried out respectively to each road near end signal and each road residual echo signal, and Beam forming is carried out respectively to each road near end signal and each road residual echo signal, obtain all the way beam forming near end signal and Beam forming residual echo signal all the way.Therefore, the beam forming near end signal and beam forming residual echo signal obtained is more It is clear.
In an embodiment of the invention, the step 104 in flow chart shown in above-mentioned Fig. 1 to beam forming treated extremely Few near end signal all the way and at least residual echo signal progress nonlinear echo inhibition processing all the way, obtain non-linear time Sound inhibits output signal, may include:
The beam forming residual echo signal carries out non-linear to the beam forming near end signal all the way and all the way Echo inhibition is handled, and is obtained the nonlinear echo and is inhibited output signal.
According to above-described embodiment, beam forming near end signal and beam forming residual echo signal are carried out non-linear time Sound inhibition handles to obtain nonlinear echo inhibition output signal.Therefore echo can further be reduced.
In an embodiment of the invention, the step 105 in flow chart shown in above-mentioned Fig. 1 inhibits the nonlinear echo Output signal carries out noise reduction and gain process, may include:
Determine that the nonlinear echo inhibits the signal-to-noise ratio of output signal;
By formula (1), determine that the nonlinear echo inhibits the corresponding noise reduced output signal of output signal;
Wherein, the T characterizes the noise reduced output signal;The P characterizes the nonlinear echo and inhibits output signal;Institute It states S and characterizes the signal-to-noise ratio.
In the present embodiment, determine that nonlinear echo inhibits the method for signal-to-noise ratio of output signal can be:Determine wave Beam shaping near end signal and the corresponding gross energy of beam forming residual echo signal, and determine that gross energy is residual with beam forming Quotient between the corresponding energy of remaining echo signal.The quotient is just signal-to-noise ratio.
According to above-described embodiment, since signal-to-noise ratio inhibits output signal to carry out noise reduction process nonlinear echo, Noise can be reduced to greatest extent.
In an embodiment of the invention, the step 105 in flow chart shown in above-mentioned Fig. 1 inhibits the nonlinear echo Output signal carries out noise reduction and gain process, may include:
Determine the corresponding at least one frequency point of the noise reduced output signal;
According to formula (2), corresponding first yield value of each described frequency point is determined;
Wherein, the NiCharacterize corresponding first yield value of i-th of frequency point;The HiCharacterize i-th of frequency point pair The peak value answered;The K1 characterizes first constant;The K2 characterizes second constant;
Using corresponding first yield value of frequency point described in each, the corresponding increasing of each described frequency point is determined Beneficial output signal.
In the present embodiment, K1 and K2 is two endpoint values in a section of default settings, for example, the section is [K1, K2].The numerical value of K1 and K2 can be determined according to business need.For example, K1 is 0.9;K2 is 0.99.
In the present embodiment, as can be seen that not being located at the section of setting in the corresponding peak value of any frequency point from formula (2) When interior, and when less than any value in section, larger yield value can be determined at this time, with corresponding non-thread to the frequency point Property echo inhibit output signal enhanced.When the corresponding peak value of any frequency point is not located in the section of setting, and it is more than area In any value when, can determine smaller yield value at this time, with to the corresponding nonlinear echo of the frequency point inhibit it is defeated After going out signal progress gain, the peak value after gain can be located in the section, to ensure that gain output signal is undistorted.
It is targeted due to inhibiting the corresponding each frequency point of output signal to carry out nonlinear echo according to above-described embodiment Carry out gain, therefore noise is relatively high.
For microphone array includes 4 microphones below, method of speech processing is illustrated.As shown in Fig. 2, should Method of speech processing includes:
Step 201:Near-end speech is oriented using microphone array.
Step 202:Beam forming is carried out to the near-end speech after orientation.
In this step, beam forming is carried out according to the near-end speech after orientation, to carry out gain to determining rearwardly direction, The voice in the corresponding direction of non-directional is inhibited.
Step 203:At least near end signal all the way is obtained from the near-end speech after beam forming using microphone array.
In this step, 4 road near end signals are got using 4 microphones.
Step 204:Reference signal is filtered using preset at least one filter, obtains estimated echo letter Number.
In this step, filter can be filtered reference signal using the filtering method of itself, and obtain Estimated echo signal.Estimated echo signal is removed near end signal, can obtain residual echo signal.
Step 205:It is executed for per near end signal all the way:Estimated echo signal is eliminated near end signal, is obtained To the corresponding residual echo signal of near end signal;It determines corresponding second yield value of near end signal, and utilizes the second yield value pair Residual echo signal carries out echo compression processing.
Step 206:Time delay adjustment is carried out respectively to each road near end signal and each road residual echo signal.
In this step, since to obtain the time different for each road near end signal, it is therefore desirable to each road near end signal respectively into Row time delay adjusts, to unify the time of each road near end signal respectively.
Step 207:Beam forming is carried out to the roads time delay adjustment Hou Ge near end signal, obtains beam forming proximal end letter all the way Number.
Step 208:Beam forming is carried out to the roads time delay adjustment Hou Ge residual echo signal, it is residual to obtain beam forming all the way Remaining echo signal.
Step 209:Beam forming residual echo signal carries out non-linear to beam forming near end signal all the way and all the way Echo inhibition is handled, and is obtained nonlinear echo and is inhibited output signal.
Step 210:Determine that nonlinear echo inhibits the signal-to-noise ratio of output signal.
In this step, beam forming near end signal and the corresponding total energy of beam forming residual echo signal are determined Amount, and determine the quotient between gross energy energy corresponding with beam forming residual echo signal.The quotient is just signal-to-noise ratio.
Step 211:Determine that nonlinear echo inhibits the corresponding noise reduced output signal of output signal according to signal-to-noise ratio.
In this step, determine that nonlinear echo inhibits the corresponding noise reduced output signal of output signal using formula (1).
Step 212:Determine that the nonlinear echo after noise reduction process inhibits the corresponding at least one frequency point of output signal.
Step 213:Determine corresponding first yield value of each frequency point.
In this step, corresponding first yield value of each frequency point is determined according to formula (2).
In this step, when the corresponding peak value of any frequency point is not located in the section of setting, and less than appointing in section When one numerical value, can determine larger yield value at this time, with to the corresponding nonlinear echo of the frequency point inhibit output signal into Row enhancing.When the corresponding peak value of any frequency point is not located in the section of setting, and when more than any value in section, at this time It can determine smaller yield value, after inhibiting output signal to carry out gain the corresponding nonlinear echo of the frequency point, gain Peak value afterwards can be located in the section, to ensure that gain output signal is undistorted.
Step 214:Using corresponding first yield value of each frequency point, the corresponding increasing of each frequency point is determined Beneficial output signal.
As shown in figure 3, an embodiment of the present invention provides a kind of voice processing apparatus, which includes:
Acquisition module 301, for passing through microphone array acquisition at least near end signal all the way;
Echo cancellation module 302, for it is described at least all the way near end signal carry out echo cancellation process, obtain at least one Road residual echo signal;
Beamforming block 303, at least near end signal and described at least remaining all the way being returned all the way to described respectively Acoustical signal carries out beam forming processing;
Nonlinear echo suppression module 304, for beam forming treated at least near end signal and described all the way At least residual echo signal carries out nonlinear echo inhibition processing all the way, obtains nonlinear echo and inhibits output signal;
Processing module 305, for inhibiting output signal to carry out noise reduction and gain process the nonlinear echo.
Embodiment according to Fig.3, in this programme when microphone array gets near end signal, near end signal Echo cancellation process, beam forming processing, nonlinear echo inhibition processing and noise reduction and gain process are carried out.Inhibiting back While sound, to greatest extent so that sound is undistorted.Therefore, embodiment provided by the invention can improve signal-to-noise ratio.
In an embodiment of the invention, as shown in figure 4, the processing module 305 may include noise reduction submodule 3051, For determining that the nonlinear echo inhibits the signal-to-noise ratio of output signal;By formula (1), the nonlinear echo suppression is determined The corresponding noise reduced output signal of output signal processed;
Wherein, the T characterizes the noise reduced output signal;The P characterizes the nonlinear echo and inhibits output signal;Institute It states S and characterizes the signal-to-noise ratio.
In an embodiment of the invention, as shown in figure 4, the processing module 305 may include gain submodule 3052, For determining the corresponding at least one frequency point of the noise reduced output signal;According to formula (2), each described frequency point point is determined Not corresponding first yield value;Using corresponding first yield value of frequency point described in each, each described frequency point is determined Corresponding gain output signal.
Wherein, the NiCharacterize corresponding first yield value of i-th of frequency point;The HiCharacterize i-th of frequency point pair The peak value answered;The K1 characterizes first constant;The K2 characterizes second constant.
In an embodiment of the invention, the echo cancellation module 302, for utilizing preset at least one filter Reference signal is filtered, estimated echo signal is obtained;It is executed for per the near end signal described all the way:Described The estimated echo signal is eliminated near end signal, obtains the corresponding residual echo signal of the near end signal.
In an embodiment of the invention, the echo cancellation module 302 is further used for determining the near end signal pair The second yield value answered;Echo compression processing is carried out to the residual echo signal using second yield value.
In an embodiment of the invention, the beamforming block 303, for it is described at least all the way near end signal with And described at least residual echo signal carries out time delay adjustment respectively all the way;At least near end signal all the way described in after being adjusted to time delay Beam forming is carried out, beam forming near end signal all the way is obtained;At least residual echo signal all the way described in after being adjusted to time delay Beam forming is carried out, beam forming residual echo signal all the way is obtained.
In an embodiment of the invention, the nonlinear echo suppression module 304, for the beam forming all the way Near end signal and all the way the beam forming residual echo signal progress nonlinear echo inhibition processing, obtain described non-linear Echo inhibits output signal.
In an embodiment of the invention, the acquisition module 301, for utilizing the microphone array to near-end speech It is oriented;Beam forming is carried out to the near-end speech after orientation;Described in being obtained from the near-end speech after beam forming at least Near end signal all the way.
A kind of storage medium is provided in one embodiment of the invention, the storage medium includes the program of storage, wherein Equipment where controlling the storage medium when described program is run execute it is any one of above-mentioned described in method of speech processing.
A kind of electronic equipment is provided in one embodiment of the invention, as shown in figure 5, the electronic equipment includes processing Device 401, memory 402 and bus 403;The processor 401, the memory 402 are completed each other by the bus 403 Communication;The processor 401 is used to call the program instruction in the memory 402, to execute any one of above-mentioned institute The method of speech processing stated.
The contents such as the information exchange between each unit, implementation procedure in above-mentioned apparatus, due to implementing with the method for the present invention Example is based on same design, and particular content can be found in the narration in the method for the present invention embodiment, and details are not described herein again.
The each embodiment of the present invention at least has the advantages that:
1, in embodiments of the present invention, one or more near end signal, the roads Bing Duige proximal end are received by microphone array Signal carries out echo cancellation process, obtains one or more residual echo signal.Then respectively to each road near end signal and each Road residual echo signal carries out beam forming processing, and is returned to beam forming treated each road near end signal and each road remnants Acoustical signal carries out nonlinear echo inhibition processing, obtains nonlinear echo and inhibits output signal.Finally nonlinear echo is inhibited Output signal carries out noise reduction and gain process.By above-mentioned it is found that getting near end signal in microphone array in this programme When, echo cancellation process, beam forming processing, nonlinear echo inhibition processing and noise reduction and gain have been carried out near end signal Processing.While inhibiting echo, to greatest extent so that sound is undistorted.Therefore, scheme provided in an embodiment of the present invention can To improve signal-to-noise ratio.
2, in embodiments of the present invention, when there are near-end speech, near-end speech is carried out first with microphone array Orientation, and beam forming is carried out to the near-end speech after orientation, and obtain from the near-end speech after beam forming all the way or more Road near end signal.Due to being had gain and inhibition for direction to near-end voice signals using beam forming, obtain To near end signal in noise it is relatively low.
3, in embodiments of the present invention, reference signal is filtered using filter to obtain estimated echo signal, And it eliminates estimated echo signal respectively in per near end signal all the way and has obtained per the corresponding remnants of near end signal all the way Echo signal.Due to having eliminated estimated echo signal in each road near end signal, echo can be reduced.
4, in embodiments of the present invention, due to be utilized corresponding second yield value of near end signal to residual echo signal into Row echo compression processing.Therefore, echo can be reduced to greatest extent.
5, in embodiments of the present invention, time delay tune is carried out respectively to each road near end signal and each road residual echo signal Whole, the roads Bing Duige near end signal and each road residual echo signal carry out beam forming respectively, obtain beam forming proximal end all the way Signal and all the way beam forming residual echo signal.Therefore, the beam forming near end signal and beam forming residual echo obtained Signal is relatively sharp.
6, in embodiments of the present invention, beam forming near end signal and beam forming residual echo signal are carried out non-thread Property echo inhibition handle to obtain nonlinear echo and inhibit output signal.Therefore echo can further be reduced.
7, in embodiments of the present invention, output signal is inhibited to carry out noise reduction process nonlinear echo due to signal-to-noise ratio, Therefore noise can be reduced to greatest extent.
8, in embodiments of the present invention, due to inhibiting the corresponding each frequency point of output signal to carry out needle nonlinear echo Carry out gain to property, therefore noise is relatively high.
It should be noted that herein, such as first and second etc relational terms are used merely to an entity Or operation is distinguished with another entity or operation, is existed without necessarily requiring or implying between these entities or operation Any actual relationship or order.Moreover, the terms "include", "comprise" or its any other variant be intended to it is non- It is exclusive to include, so that the process, method, article or equipment including a series of elements includes not only those elements, But also include other elements that are not explicitly listed, or further include solid by this process, method, article or equipment Some elements.In the absence of more restrictions, the element limited by sentence " including one ", is not arranged Except there is also other identical factors in the process, method, article or apparatus that includes the element.
One of ordinary skill in the art will appreciate that:Realize that all or part of step of above method embodiment can pass through The relevant hardware of program instruction is completed, and program above-mentioned can be stored in computer-readable storage medium, the program When being executed, step including the steps of the foregoing method embodiments is executed;And storage medium above-mentioned includes:ROM, RAM, magnetic disc or light In the various media that can store program code such as disk.
Finally, it should be noted that:The foregoing is merely presently preferred embodiments of the present invention, is merely to illustrate the skill of the present invention Art scheme, is not intended to limit the scope of the present invention.Any modification for being made all within the spirits and principles of the present invention, Equivalent replacement, improvement etc., are included within the scope of protection of the present invention.

Claims (10)

1. a kind of method of speech processing, which is characterized in that including:
At least near end signal all the way is obtained by microphone array;
Near end signal carries out echo cancellation process all the way to described at least, obtains at least residual echo signal all the way;
Respectively at least near end signal and at least residual echo signal progress beam forming processing all the way all the way;
To beam forming treated at least all the way near end signal and it is described at least all the way residual echo signal carry out it is non-linear Echo inhibition is handled, and is obtained nonlinear echo and is inhibited output signal;
Output signal is inhibited to carry out noise reduction and gain process the nonlinear echo.
2. method of speech processing according to claim 1, which is characterized in that
It is described to inhibit output signal to carry out noise reduction and gain process the nonlinear echo, including:
Determine that the nonlinear echo inhibits the signal-to-noise ratio of output signal;
By the first formula, determine that the nonlinear echo inhibits the corresponding noise reduced output signal of output signal;
First formula includes:
Wherein, the T characterizes the noise reduced output signal;The P characterizes the nonlinear echo and inhibits output signal;The S Characterize the signal-to-noise ratio.
3. method of speech processing according to claim 2, which is characterized in that
It is described to inhibit output signal to carry out noise reduction and gain process the nonlinear echo, including:
Determine the corresponding at least one frequency point of the noise reduced output signal;
According to the second formula, corresponding first yield value of each described frequency point is determined;
Second formula includes:
Wherein, the NiCharacterize corresponding first yield value of i-th of frequency point;The HiIt is corresponding to characterize i-th of frequency point Peak value;The K1 characterizes first constant;The K2 characterizes second constant;
Using corresponding first yield value of frequency point described in each, determine that the corresponding gain of each described frequency point is defeated Go out signal.
4. according to any method of speech processing in claim 1-3, which is characterized in that
Described near end signal carries out echo cancellation process all the way to described at least, obtains at least residual echo signal all the way, including:
Reference signal is filtered using preset at least one filter, obtains estimated echo signal;
It is performed both by for per the near end signal described all the way:The estimated echo signal is eliminated in the near end signal, is obtained The corresponding residual echo signal of the near end signal.
5. according to any method of speech processing in claim 1-3, which is characterized in that
It is described respectively to it is described at least all the way near end signal and it is described at least all the way residual echo signal carry out beam forming at Reason, including:
To at least near end signal and described at least residual echo signal carries out time delay adjustment respectively all the way all the way;
At least near end signal carries out beam forming all the way described in after being adjusted to time delay, obtains beam forming near end signal all the way;
At least residual echo signal carries out beam forming all the way described in after being adjusted to time delay, obtains beam forming remnants all the way and returns Acoustical signal;
And/or
It is described that at least near end signal all the way is obtained by microphone array, including:
Near-end speech is oriented using the microphone array;
Beam forming is carried out to the near-end speech after orientation;
At least near end signal all the way described in being obtained from the near-end speech after beam forming.
6. a kind of voice processing apparatus, which is characterized in that including:
Acquisition module, for passing through microphone array acquisition at least near end signal all the way;
Echo cancellation module obtains at least remaining all the way near end signal to carry out echo cancellation process all the way to described at least Echo signal;
Beamforming block, for respectively to it is described at least all the way near end signal and it is described at least all the way residual echo signal into The processing of traveling wave beam shaping;
Nonlinear echo suppression module, for beam forming treated at least all the way near end signal and it is described at least all the way Residual echo signal carries out nonlinear echo inhibition processing, obtains nonlinear echo and inhibits output signal;
Processing module, for inhibiting output signal to carry out noise reduction and gain process the nonlinear echo.
7. voice processing apparatus according to claim 6, which is characterized in that
The processing module, including:Noise reduction submodule;
The noise reduction submodule, for determining that the nonlinear echo inhibits the signal-to-noise ratio of output signal;By the first formula, really It makes the nonlinear echo and inhibits the corresponding noise reduced output signal of output signal;
First formula includes:
Wherein, the T characterizes the noise reduced output signal;The P characterizes the nonlinear echo and inhibits output signal;The S Characterize the signal-to-noise ratio.
8. voice processing apparatus according to claim 7, which is characterized in that
The processing module, including:Gain submodule;
The gain submodule, for determining the corresponding at least one frequency point of the noise reduced output signal;According to the second formula, really Make corresponding first yield value of each described frequency point;Utilize corresponding first gain of frequency point described in each Value determines the corresponding gain output signal of each described frequency point;
Second formula includes:
Wherein, the NiCharacterize corresponding first yield value of i-th of frequency point;The HiIt is corresponding to characterize i-th of frequency point Peak value;The K1 characterizes first constant;The K2 characterizes second constant.
9. a kind of storage medium, which is characterized in that the storage medium includes the program of storage, wherein run in described program When control the storage medium where equipment perform claim require 1 to the speech processes side described in any one of claim 5 Method.
10. a kind of electronic equipment, which is characterized in that the electronic equipment includes processor, memory and bus;The processing Device, the memory complete mutual communication by the bus;The processor is used to call the journey in the memory Sequence instructs, with the method for speech processing described in any one of perform claim requirement 1 to claim 5.
CN201810496822.1A 2018-05-22 2018-05-22 Voice processing method and device Active CN108766456B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201810496822.1A CN108766456B (en) 2018-05-22 2018-05-22 Voice processing method and device
PCT/CN2019/087301 WO2019223603A1 (en) 2018-05-22 2019-05-16 Voice processing method and apparatus and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810496822.1A CN108766456B (en) 2018-05-22 2018-05-22 Voice processing method and device

Publications (2)

Publication Number Publication Date
CN108766456A true CN108766456A (en) 2018-11-06
CN108766456B CN108766456B (en) 2020-01-07

Family

ID=64007626

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810496822.1A Active CN108766456B (en) 2018-05-22 2018-05-22 Voice processing method and device

Country Status (1)

Country Link
CN (1) CN108766456B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109901113A (en) * 2019-03-13 2019-06-18 出门问问信息科技有限公司 A kind of voice signal localization method, apparatus and system based on complex environment
CN109920405A (en) * 2019-03-05 2019-06-21 百度在线网络技术(北京)有限公司 Multi-path voice recognition methods, device, equipment and readable storage medium storing program for executing
CN110097891A (en) * 2019-04-22 2019-08-06 广州视源电子科技股份有限公司 A kind of microphone signal processing method, device, equipment and storage medium
CN110310655A (en) * 2019-04-22 2019-10-08 广州视源电子科技股份有限公司 Microphone signal processing method, device, equipment and storage medium
CN110335618A (en) * 2019-06-06 2019-10-15 福建星网智慧软件有限公司 A kind of method and computer equipment improving non-linear inhibition
WO2019223603A1 (en) * 2018-05-22 2019-11-28 出门问问信息科技有限公司 Voice processing method and apparatus and electronic device
CN111524532A (en) * 2020-04-29 2020-08-11 展讯通信(上海)有限公司 Echo suppression method, device, equipment and storage medium
WO2021203603A1 (en) * 2020-04-10 2021-10-14 南京拓灵智能科技有限公司 Howling suppression method and apparatus, and electronic device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101903948A (en) * 2007-12-19 2010-12-01 高通股份有限公司 Systems, methods, and apparatus for multi-microphone based speech enhancement
US20110293103A1 (en) * 2010-06-01 2011-12-01 Qualcomm Incorporated Systems, methods, devices, apparatus, and computer program products for audio equalization
US20120207325A1 (en) * 2011-02-10 2012-08-16 Dolby Laboratories Licensing Corporation Multi-Channel Wind Noise Suppression System and Method
US20130034241A1 (en) * 2011-06-11 2013-02-07 Clearone Communications, Inc. Methods and apparatuses for multiple configurations of beamforming microphone arrays
CN102957819A (en) * 2011-09-30 2013-03-06 斯凯普公司 Audio signal processing signals
CN104429100A (en) * 2012-07-02 2015-03-18 高通股份有限公司 Systems and methods for surround sound echo reduction
CN105144674A (en) * 2013-05-03 2015-12-09 高通股份有限公司 Multi-channel echo cancellation and noise suppression

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101903948A (en) * 2007-12-19 2010-12-01 高通股份有限公司 Systems, methods, and apparatus for multi-microphone based speech enhancement
US20110293103A1 (en) * 2010-06-01 2011-12-01 Qualcomm Incorporated Systems, methods, devices, apparatus, and computer program products for audio equalization
US20120207325A1 (en) * 2011-02-10 2012-08-16 Dolby Laboratories Licensing Corporation Multi-Channel Wind Noise Suppression System and Method
US20130034241A1 (en) * 2011-06-11 2013-02-07 Clearone Communications, Inc. Methods and apparatuses for multiple configurations of beamforming microphone arrays
CN102957819A (en) * 2011-09-30 2013-03-06 斯凯普公司 Audio signal processing signals
CN104429100A (en) * 2012-07-02 2015-03-18 高通股份有限公司 Systems and methods for surround sound echo reduction
CN105144674A (en) * 2013-05-03 2015-12-09 高通股份有限公司 Multi-channel echo cancellation and noise suppression

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019223603A1 (en) * 2018-05-22 2019-11-28 出门问问信息科技有限公司 Voice processing method and apparatus and electronic device
CN109920405A (en) * 2019-03-05 2019-06-21 百度在线网络技术(北京)有限公司 Multi-path voice recognition methods, device, equipment and readable storage medium storing program for executing
CN109901113A (en) * 2019-03-13 2019-06-18 出门问问信息科技有限公司 A kind of voice signal localization method, apparatus and system based on complex environment
CN110097891A (en) * 2019-04-22 2019-08-06 广州视源电子科技股份有限公司 A kind of microphone signal processing method, device, equipment and storage medium
CN110310655A (en) * 2019-04-22 2019-10-08 广州视源电子科技股份有限公司 Microphone signal processing method, device, equipment and storage medium
CN110310655B (en) * 2019-04-22 2021-10-22 广州视源电子科技股份有限公司 Microphone signal processing method, device, equipment and storage medium
CN110335618A (en) * 2019-06-06 2019-10-15 福建星网智慧软件有限公司 A kind of method and computer equipment improving non-linear inhibition
CN110335618B (en) * 2019-06-06 2021-07-30 福建星网智慧软件有限公司 Method for improving nonlinear echo suppression and computer equipment
WO2021203603A1 (en) * 2020-04-10 2021-10-14 南京拓灵智能科技有限公司 Howling suppression method and apparatus, and electronic device
CN111524532A (en) * 2020-04-29 2020-08-11 展讯通信(上海)有限公司 Echo suppression method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN108766456B (en) 2020-01-07

Similar Documents

Publication Publication Date Title
CN108766456A (en) A kind of method of speech processing and device
CN105812598B (en) A kind of hypoechoic method and device of drop
DE69922940T3 (en) Apparatus and method for combining audio compression and feedback cancellation in a hearing aid
CN107018470B (en) A kind of voice recording method and system based on annular microphone array
US8867759B2 (en) System and method for utilizing inter-microphone level differences for speech enhancement
DE69632896T2 (en) Equalization of speech signals in a mobile phone
DE112009002617B4 (en) Optional switching between multiple microphones
US10819305B2 (en) Method and device for adjusting sound quality
US10904396B2 (en) Multi-channel residual echo suppression
US10115412B2 (en) Signal processor with side-tone noise reduction for a headset
DE60017732T2 (en) DYNAMIC CONTROL FOR LOUDSPEAKERS
CN101577848B (en) Supper bass boosting method and system
US20160366260A1 (en) Method of Improving Sound Quality of Mobile Communication Terminal Under Receiver Mode
DE2824866A1 (en) FOLDED EXPONENTIAL HORN SPEAKER SMALL SIZE FOR LOW FREQUENCY WITH UNITAL TONE PATH AND SPEAKER SYSTEM INCLUDED THIS
CN108447496A (en) A kind of sound enhancement method and device based on microphone array
CN111524532B (en) Echo suppression method, device, equipment and storage medium
CN101292508B (en) Acoustic echo canceller
CN106941006A (en) Audio signal is separated into harmonic wave and transient signal component and audio signal bass boost
CN107426392B (en) Hand-free call terminal and its audio signal processing method, device
CN102316231B (en) Echo cancellation method and device
CN107426391A (en) Hand-free call terminal and its audio signal processing method, device
CN108831497A (en) A kind of echo compression method and device, storage medium, electronic equipment
CN113163152B (en) Multi-dimensional sound pickup and noise reduction method, video teleconference system and computer storage medium
JPH07240993A (en) Sound field controller
DE102012008557B4 (en) Method for feedback suppression in electroacoustic systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20230620

Address after: 210034 floor 8, building D11, Hongfeng Science Park, Nanjing Economic and Technological Development Zone, Jiangsu Province

Patentee after: New Technology Co.,Ltd.

Patentee after: VOLKSWAGEN (CHINA) INVESTMENT Co.,Ltd.

Address before: 100094 1001, 10th floor, office building a, 19 Zhongguancun Street, Haidian District, Beijing

Patentee before: MOBVOI INFORMATION TECHNOLOGY Co.,Ltd.

TR01 Transfer of patent right