WO2015085946A1 - 语音信号处理方法、装置及服务器 - Google Patents

语音信号处理方法、装置及服务器 Download PDF

Info

Publication number
WO2015085946A1
WO2015085946A1 PCT/CN2014/093656 CN2014093656W WO2015085946A1 WO 2015085946 A1 WO2015085946 A1 WO 2015085946A1 CN 2014093656 W CN2014093656 W CN 2014093656W WO 2015085946 A1 WO2015085946 A1 WO 2015085946A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
sub
weight
voice
voice signal
Prior art date
Application number
PCT/CN2014/093656
Other languages
English (en)
French (fr)
Inventor
马跃
胡建强
张帆
刘丽
成家雄
宋思超
Original Assignee
广州华多网络科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州华多网络科技有限公司 filed Critical 广州华多网络科技有限公司
Publication of WO2015085946A1 publication Critical patent/WO2015085946A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/21Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/56Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities

Definitions

  • the embodiments of the present invention relate to the field of communications technologies, and in particular, to a voice signal processing method, apparatus, and server.
  • the voice signals of multiple channels are generally simply superimposed.
  • Embodiments provide a voice signal processing method, apparatus, and server.
  • the technical solution is as follows:
  • a method for processing a voice signal comprising:
  • the processed voice signal is obtained according to the first weight of each of the first voice signals of the plurality of channels and the first voice signal of the multiple channels, including:
  • the predetermined loudness sum is the same as the first sub-signal of the first speech signal of the plurality of channels The loudness sum of the sub-signals other than the sub-signals whose weight is set to 0;
  • adjusting a corresponding sub-signal of the original voice signal according to a third weight of each of the sub-signals in the first voice signal including:
  • the third weight of the sub-signal is multiplied by the amplitude of the sub-signal in the original speech signal to obtain an adjusted sub-signal.
  • acquiring the third weight of each sub-signal in the first voice signal according to the second weight of each of the first voice signals includes:
  • the processing according to the first weight of each of the first voice signals of the plurality of channels, and the first voice signal of the multiple channels, to obtain the processed voice signal, including:
  • the sub-signal A weight is multiplied by the amplitude of the sub-signal in the original speech signal to obtain an adjusted sub-signal.
  • the processing according to the first weight of each of the first voice signals of the plurality of channels, and the first voice signal of the multiple channels, to obtain the processed voice signal, including:
  • the predetermined loudness sum is the same as the first sub-signal of the first speech signal of the plurality of channels The loudness sum of the sub-signals other than the sub-signals whose weight is set to 0;
  • the processing according to the first weight of each of the first voice signals of the plurality of channels, and the first voice signal of the multiple channels, to obtain the processed voice signal, including:
  • the adjusted sub-signal is obtained according to the fourth weight of each sub-signal of the first speech signal multiplied by the amplitude of the sub-signal in the original speech signal;
  • the method further includes:
  • the processed speech signal When the amplitude of the processed speech signal is greater than a preset threshold, the processed speech signal is nonlinearly mapped to obtain an output speech signal.
  • a voice signal processing apparatus comprising:
  • An original voice signal acquiring module configured to acquire original voice signals of multiple channels, where the original voice signal is a digital voice signal
  • a filtering module configured to filter the original voice signal of each channel to obtain a first voice signal of each channel, where the frequency of the first voice signal belongs to a preset frequency range;
  • a loudness obtaining module configured to acquire a loudness of each sub-signal of the first voice signal for a first voice signal of each channel
  • a weight obtaining module configured to acquire, according to a loudness of each sub-signal of the first voice signal and a loudness sum of the same sub-signal of the multiple channels, a first weight of each sub-signal in the first voice signal
  • the voice signal processing module is configured to obtain the processed voice signal according to the first weight of each of the first voice signals of the plurality of channels and the first voice signal of the plurality of channels.
  • the voice signal processing module includes:
  • a specified threshold determining unit configured to determine a specified threshold according to a maximum value of the first weight of the multiple channels
  • a weight obtaining unit configured to set a second weight of the sub-signal with the first weight less than the specified threshold to 0 for the first voice signal of each channel, according to the loudness of each sub-signal in the first voice signal And a loudness sum of the sub-signals other than the sub-signals in which the second weight is set to 0 in the first speech signal of the plurality of channels, and obtaining the first weight in the first speech signal is not less than the designation a second weight of the sub-signal of the threshold;
  • the weight obtaining unit is further configured to acquire a third weight of each sub-signal in the first voice signal according to a second weight of each sub-signal in the first voice signal for the first voice signal of each channel;
  • the voice signal processing module further includes: an adjusting unit, configured to adjust a corresponding sub-signal of the original voice signal according to a third weight of each of the sub-signals in the first voice signal for an original voice signal of each channel;
  • the voice signal processing unit is configured to superimpose each of the adjusted sub-signals in the plurality of channels to obtain a processed voice signal.
  • the adjusting unit is further configured to, for each segment of the sub-signal, multiply the third weight of the sub-signal with the amplitude of the sub-signal in the original speech signal to obtain an adjusted sub-signal.
  • the weight obtaining unit is further configured to: for each of the first voice signals of each channel, each of the first voice signals according to a second weight of each of the first voice signals in the first voice signal The weight of the number is smoothed to obtain a third weight of each sub-signal in the first speech signal.
  • the voice signal processing module includes:
  • a first adjusting unit configured to multiply the first weight of the sub-signal and the amplitude of the sub-signal in the original speech signal for each of the first speech signals of the plurality of channels , get the adjusted sub-signal.
  • the first processing unit is configured to superimpose each of the adjusted sub-signals of the plurality of channels to obtain a processed speech signal.
  • the voice signal processing module includes:
  • a specified threshold determining unit configured to determine a specified threshold according to a maximum value of the plurality of first weights
  • a second weighting unit configured to set a second weight of the sub-signal with the first weight less than the specified threshold to 0 for the first voice signal of each channel, according to the loudness of each sub-signal in the first voice signal And obtaining a second weight of the sub-signal in which the first weight is not less than the specified threshold in the first voice signal; wherein the predetermined loudness is the same as the first voice signal of the plurality of channels The sum of the loudness of the sub-signal other than the sub-signal having the second weight set to 0 is removed from the segment sub-signal;
  • a second adjusting unit configured to multiply the second weight of the sub-signal and the amplitude of the sub-signal in the original speech signal for each of the first speech signals of the plurality of channels , get the adjusted sub-signal.
  • the second processing unit is configured to superimpose each of the adjusted sub-signals of the plurality of channels to obtain a processed speech signal.
  • the voice signal processing module includes:
  • a fourth weight unit configured to smooth, for each channel of the first voice signal, a weight of each sub-signal of the first voice signal according to a first weight of each of the first voice signals, Obtaining a fourth weight of each sub-signal in the first voice signal;
  • a fourth adjusting unit configured to: for the original voice signal of each channel, multiply the fourth weight of each sub-signal in the first voice signal by the amplitude of the sub-signal in the original voice signal to obtain an adjustment Sub-signal;
  • a fourth processing unit configured to superimpose each of the adjusted sub-signals of the plurality of channels to obtain a processed speech signal.
  • the device further includes:
  • a voice signal output module configured to: when the amplitude of the processed voice signal is greater than a preset threshold At the same time, the processed speech signal is nonlinearly mapped to obtain an output speech signal.
  • a server comprising: a processor and a memory, the processor being coupled to the memory,
  • the processor is configured to acquire original voice signals of multiple channels, where the original voice signals are digital voice signals;
  • the processor is further configured to filter the original voice signal of each channel to obtain a first voice signal of each channel, where the frequency of the first voice signal belongs to a preset frequency range;
  • the processor is further configured to acquire a loudness of each sub-signal in the first voice signal for a first voice signal of each channel;
  • the processor is further configured to acquire a first weight of each sub-signal in the first voice signal according to a loudness of each sub-signal of the first voice signal and a loudness sum of the same sub-signal of the multiple channels ;
  • the processor is further configured to obtain a processed speech signal according to a first weight of each of the first sub-signals of the plurality of channels and a first speech signal of the plurality of channels.
  • the processor is further configured to determine a specified threshold according to a maximum value of the plurality of first weights
  • the processor is further configured to, for each channel of the first voice signal, set a second weight of the sub-signal whose first weight is less than the specified threshold to 0, according to each sub-signal of the first voice signal. Acquiring a second weight of the first signal in the first voice signal that is not less than the specified threshold, and the predetermined loudness sum is the first voice signal of the plurality of channels The loudness sum of the sub-signals other than the sub-signal having the second weight set to 0 is removed from the same sub-signal;
  • the processor is further configured to acquire a third weight of each sub-signal in the first voice signal according to a second weight of each sub-signal in the first voice signal for the first voice signal of each channel;
  • the processor is further configured to, for each channel of the original voice signal, adjust a corresponding sub-signal of the original voice signal according to a third weight of each segment of the first voice signal;
  • the processor is further configured to superimpose each of the adjusted sub-signals of the plurality of channels to obtain a processed speech signal.
  • the processor is further configured to: for each segment of the sub-signal, the third weight of the sub-signal And multiplying the amplitude of the sub-signal in the original speech signal to obtain an adjusted sub-signal.
  • the processor is further configured to: for each channel of the first voice signal, according to a second weight of each segment of the first voice signal, for each segment of the first voice signal The weight is smoothed to obtain a third weight of each sub-signal in the first speech signal.
  • the processor is further configured to, for each segment of the first voice signal of the multiple channels, the first weight of the sub-signal and the sub-signal in the original voice signal The amplitudes are multiplied to obtain the adjusted sub-signals.
  • the processor is further configured to superimpose each of the adjusted sub-signals of the plurality of channels to obtain a processed speech signal.
  • the processor is further configured to determine a specified threshold according to a maximum value of the plurality of first weights
  • the processor is further configured to, for each channel of the first voice signal, set a second weight of the sub-signal whose first weight is less than the specified threshold to 0, according to each sub-signal of the first voice signal. Acquiring a second weight of the first signal in the first voice signal that is not less than the specified threshold by the loudness and the predetermined loudness sum; wherein the predetermined loudness sum is the first voice signal of the plurality of channels The loudness sum of the sub-signals other than the sub-signal having the second weight set to 0 is removed from the same sub-signal;
  • the processor is further configured to: for each of the first speech signals of the plurality of channels, the second weight of the sub-signal and the amplitude of the sub-signal in the original speech signal Multiply, get the adjusted sub-signal.
  • the processor is further configured to superimpose each of the adjusted sub-signals of the plurality of channels to obtain a processed speech signal.
  • the processing according to the first weight of each of the first voice signals of the plurality of channels, and the first voice signal of the multiple channels, to obtain the processed voice signal, including:
  • the processor is further configured to: for each channel of the first voice signal, perform smoothing on weights of each of the first voice signals according to a first weight of each of the first voice signals Obtaining a fourth weight of each sub-signal in the first voice signal;
  • the processor is further configured to: for the original voice signal of each channel, multiply the fourth weight of each sub-signal in the first voice signal by the amplitude of the sub-signal in the original voice signal to obtain Adjusted sub-signal;
  • the processor is further configured to superimpose each of the adjusted sub-signals in the plurality of channels To the processed speech signal.
  • the processor is further configured to perform non-linear mapping on the processed voice signal to obtain an output voice signal when the amplitude of the processed voice signal is greater than a preset threshold.
  • the first voice signal of each channel is obtained by filtering the digital voice signals of the plurality of channels by removing the voice signal that does not include the normal voice of the human, and according to the loudness of each of the sub-signals in the first voice signal,
  • the first speech signal of the channel is processed to obtain the processed speech signal, which effectively removes the useless signal with low loudness in the speech signal, so that the processed speech noise is reduced, and the recognition degree of the speech signal is improved, which is convenient for the user to process.
  • a useful signal is identified in the speech signal.
  • FIG. 1 is a flowchart of a voice signal processing method according to an embodiment of the present invention.
  • FIG. 2 is a flowchart of another voice signal processing method according to an embodiment of the present invention.
  • FIG. 3 is a flowchart of another voice signal processing method according to an embodiment of the present invention.
  • FIG. 4 is a flowchart of another voice signal processing method according to an embodiment of the present invention.
  • FIG. 5 is a flowchart of another voice signal processing method according to an embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of a voice signal processing apparatus according to an embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of a server according to an embodiment of the present invention.
  • FIG. 1 is a flowchart of a method for processing a voice signal according to an embodiment of the present invention.
  • the embodiment is exemplified by taking an execution entity as a server, and the method includes:
  • the method provided by the embodiment of the present invention by filtering a digital voice signal of a plurality of channels, removing a voice signal that does not include a normal voice of a person, obtaining a first voice signal of each channel, and according to each segment of the first voice signal
  • the loudness of the signal, the first speech signal of the plurality of channels is processed to obtain the processed speech signal, and the unwanted signal with low loudness in the speech signal is effectively removed, so that the processed speech noise is reduced, and the speech signal is recognized.
  • the degree is improved and is convenient for identifying useful signals from the processed speech signal.
  • FIG. 2 is a flowchart of another voice signal processing method according to an embodiment of the present invention.
  • the embodiment is exemplified by taking an execution entity as a server, and the method includes:
  • the server when the user performs voice communication with multiple contacts through an instant messaging application, or the user performs voice communication in a group of instant communication applications, the server may receive the same time period. To the voice signal from multiple users, the server uses each user's voice signal as the original voice signal of a channel.
  • the server receives the original voice signal sent by the multiple channels, and the original voice signal is transmitted in one frame and one frame, that is, the original voice signal includes a plurality of temporally consecutive frames.
  • the original voice signal is a digital speech signal.
  • the digital voice signals of the plurality of channels include not only voice signals required by the user, but also a large number of useless signals, such as noise.
  • the server needs to filter out the useful signal from the original speech signals of the plurality of channels, and the useful signal may be a speech signal in a frequency range belonging to a person's normal utterance.
  • the preset frequency range may be set by a technician at the time of development, or may be adjusted by the user in the process of use, which is not limited by the embodiment of the present invention.
  • the preset frequency range may specifically be 100Hz ⁇ 4KHz, can also be other frequency ranges.
  • the server may further distinguish the useful signal and the useless signal in the first voice signal according to the loudness, and the user's voice is generally louder than the background sound.
  • the server may calculate the loudness of each sub-signal in the first speech signal of each channel according to a preset loudness algorithm.
  • the preset loudness algorithm may be set by the technician at the time of development, or may be adjusted by the user in the process of use, which is not limited by the embodiment of the present invention.
  • the preset loudness algorithm may specifically be a Zwicker sound measurement model, and may of course be other loudness algorithms.
  • the loudness of a sub-signal in the loudness and the proportion of the sub-signal of the same sub-signal can directly affect the recognition of the sub-signal in the superimposed speech signal. Therefore, the server can determine each sub-signal of all the channels by step 204.
  • the same sub-signal refers to a sub-signal belonging to the same time slice in the time dimension in the first speech signal of the plurality of channels.
  • the server adds the loudness of the same sub-signal of the first speech signal of the plurality of channels to obtain the loudness sum of the same sub-signal of the multiple channels.
  • the server divides the loudness of each sub-signal in the first voice signal with the loudness and the sum of the same sub-signals of the multiple channels to obtain a first weight of each sub-signal in the first voice signal.
  • the first speech signal of each channel includes 3 sub-signals, which are sub-signal 1, sub-signal 2, and sub-signal 3, respectively, and in channel 1
  • the first weight of the sub-signal 1 in the channel 1 is 1/3
  • the first weight of the sub-signal 2 is 3/8
  • the first weight of the sub-signal 3 is 4/11
  • the sub-signal 1 in the channel 2 The first weight of the sub-signal 2 is 5/8
  • the first weight of the sub-signal 3 is 7/11. That is, as shown in the following table:
  • Subsignal 1 Subsignal 2 Sub-signal 3 Channel 1 (loudness) 1 3 4 Channel 2 (loudness) 2 5 7
  • the amplitude can be used to represent the frequency or signal strength of the original speech signal, which varies according to the sampling parameters used in the analog to digital conversion.
  • the amplitude may be represented by other parameters, which are not limited by the embodiment of the present invention.
  • the server For each sub-signal after the adjustment, the server superimposes the sub-signals belonging to the same time slice in the multiple channels to obtain the processed speech signal.
  • the adjusted sub-signal 11 is added to the adjusted sub-signal 21 to obtain a sub-signal 1 of the processed speech signal.
  • the method provided by the embodiment of the present invention by filtering a digital voice signal of a plurality of channels, removing a voice signal that does not include a normal voice of a person, obtaining a first voice signal of each channel, and according to each segment of the first voice signal
  • the loudness of the signal, the first speech signal of the plurality of channels is processed to obtain the processed speech signal, and the unwanted signal with low loudness in the speech signal is effectively removed, so that the processed speech noise is reduced, and the speech signal is recognized.
  • the degree is improved and is convenient for identifying useful signals from the processed speech signal.
  • the method provided by the embodiment of the present invention obtains the first weight according to the loudness, and adjusts the original voice signal according to the first weight, and then superimposes to obtain the processed voice signal; the voice signal can be effectively reduced in the voice signal.
  • the useless signal reduces the noise in the processed speech signal.
  • the voice signal processing method may include:
  • the sub-signal with the first weight is usually a noise signal.
  • the server In order to filter out the sub-signal with the first weight, the server needs to determine the specified threshold according to the plurality of first weights.
  • the specified threshold may be 0.1 times or the like of the maximum value of the plurality of first weights, and the specified threshold may be other representations, which is not limited in the embodiment of the present invention.
  • a second weight of the sub-signal For a first voice signal of each channel, set a second weight of the sub-signal with a first weight less than a specified threshold to 0, and obtain a first voice according to a loudness of each sub-signal in the first voice signal and a predetermined sum of loudness The second weight of the sub-signal in the signal that is not less than the specified threshold.
  • the predetermined loudness sum is a sum of loudness of the sub-signals other than the sub-signals in which the second weight is set to 0, among the same sub-signals of the first speech signals of the plurality of channels.
  • the server sets the second weight of the sub-signal whose first weight is less than the specified threshold to 0, and calculates the sub-signal of the same sub-signal in the first speech signal of the plurality of channels except that the second weight has been set to 0.
  • the server may utilize the loudness of each sub-signal of the first speech signal and the predetermined loudness and division to obtain a second weight of the sub-signal of the first speech signal whose first weight is not less than a specified threshold.
  • the maximum value of the first weight of the two channels is 2/3, and if the specified threshold is 0.35, the first weight 1/3 of the sub-signal 1 in the channel 1 is less than the specified threshold, and the server will The second weight of the sub-signal 1 in channel 1 is set to zero.
  • the server first removes the loudness of the sub-signal 1 in the channel 1, and then calculates the loudness of the first sub-signal and equals the channel 2.
  • the adjusted sub-signal 21 is obtained.
  • the server For each sub-signal after the adjustment, the server superimposes the sub-signals belonging to the same time slice in the multiple channels to obtain the processed speech signal.
  • the adjusted sub-signal 11 is added to the adjusted sub-signal 21 to obtain a sub-signal 1 of the processed speech signal.
  • the voice signal processing method provided in this embodiment completely removes the sub-signal with the first weight less than the specified threshold by calculating the second weight of each sub-signal, further reducing the uselessness of the low-noise in the voice signal.
  • the signal is such that the noise in the processed speech signal is reduced.
  • the first weight may be smoothed to avoid the phenomenon that the sound signal after processing is suddenly large and small.
  • the voice signal processing method may include:
  • the weight of each segment of the first voice signal is smoothed to obtain each sub-signal of the first voice signal.
  • the fourth weight is the weight of each segment of the first voice signal.
  • the first weight of the sub-signal may be smoothed by the second-order low-pass filtering module in the server.
  • the step 205b may include: for the mth sub-signal of the first speech signal of each channel, the server according to the first weight of the mth sub-signal of the first speech signal and the second weight of the m-1th sub-signal, The weight of the mth sub-signal in the first speech signal is smoothed to obtain a fourth weight of the mth sub-signal in the first speech signal.
  • the server further uses the fourth weight of the mth sub-signal as the fourth weight initial value of the m+1th sub-signal in the channel, and according to the first weight of the m+1th sub-signal in the first speech signal, The weight of the m+1th sub-signal is smoothed to obtain a fourth weight of the m+1th sub-signal. Iterating according to the above process, the fourth weight of each sub-signal in the first speech signal is obtained.
  • the smoothing process may be to neutralize the larger weight and the smaller weight to obtain an intermediate value, which may be obtained by an algorithm such as interpolation.
  • the process of obtaining the fourth weight of the first sub-signal according to the first weight of the first sub-signal of the server may be: the server according to the first sub-signal
  • the first weight and the preset initial value are smoothed by the weight of the first segment sub-signal to obtain a fourth weight of the first segment sub-signal.
  • the fourth weight of the first segment sub-signal is used as the fourth weight initial value of the second segment sub-signal, and is obtained according to the first weight of the second-stage sub-signal.
  • the preset initial value may be set by the technician at the time of development, or may be adjusted by the user in the process of use, which is not limited by the embodiment of the present invention.
  • the first weight of the sub-signal 1 in the channel 2 is 2/3, and the configuration parameters of the second-order low-pass filtering module in the server may be 0.7 and 0.3, and the preset initial value is 0.6, according to the channel.
  • the first weight of the sub-signal 1 in the second signal and the preset initial value are used to smooth the weight of the sub-signal 1 in the channel 2, which may be: first, the server multiplies the preset initial value by 0.7, the first The weight is multiplied by 0.3, and the two results are added, and the result obtained is taken as the fourth weight of the sub-signal 1 in channel 2, and the fourth weight is 0.62.
  • the server uses the fourth weight 0.62 of the sub-signal 1 in the channel 2 as the fourth weight initial value of the sub-signal 2 in the channel 2, and the server according to the first weight 5/8 of the sub-signal 2 in the channel 2
  • the server uses the fourth weight 0.6215 of the sub-signal 2 in the channel 2 as the fourth weight initial value of the sub-signal 3 in the channel 2.
  • the adjusted sub-signal is obtained according to the fourth weight of each sub-signal in the first speech signal multiplied by the amplitude of the sub-signal in the original speech signal.
  • the server For each sub-signal after the adjustment, the server superimposes the sub-signals belonging to the same time slice in the multiple channels to obtain the processed speech signal.
  • the voice signal processing method provided in this embodiment obtains the fourth weight of each sub-signal after smoothing the first weight, thereby avoiding the phenomenon that the sound is suddenly large and small in the processed speech signal.
  • FIG. 5 is a flowchart of a voice signal processing method according to an embodiment of the present invention.
  • the embodiment is exemplified by taking an execution entity as a server, and the method includes:
  • the server acquires original voice signals of multiple channels, where the original voice signals are digital voice signals.
  • the server is used to process original voice signals of multiple channels, which may be servers for instant messaging applications, conference servers, and the like.
  • the server of the instant messaging application when the user performs voice communication with multiple contacts through the instant messaging application, or the user performs voice communication in the group of the instant communication application, the server may receive the same time period.
  • the server receives the original voice signal sent by the multiple channels, and the original voice signal is transmitted in one frame and one frame, that is, the original voice signal includes a plurality of temporally consecutive frames.
  • the original voice signal is a digital speech signal.
  • the server filters the original voice signal of each channel to obtain a first voice signal of each channel, where the frequency of the first voice signal belongs to a preset frequency range.
  • a useful signal which may be a speech signal that belongs to a frequency range in which a person normally sounds.
  • the step 502 may specifically include: the server filtering the digital signal in each channel according to the preset frequency range, filtering out the digital voice signal whose frequency is not within the preset frequency range, and obtaining the digital voice signal in the preset frequency range.
  • the server uses the digital voice signal in the preset frequency range as the first voice signal.
  • the preset frequency range may be set by a technician at the time of development, or may be adjusted by the user in the process of use, which is not limited by the embodiment of the present invention.
  • the preset frequency range may specifically be 100 Hz to 4 kHz, or may be other frequency ranges.
  • the preset frequency range is determined by using the sound frequency when the person normally sounds.
  • the preset frequency range may be determined by the frequency of other sounds, how is the embodiment of the present invention Determining the preset frequency range is not limited.
  • the server acquires the loudness of each sub-signal in the first voice signal for the first voice signal of each channel.
  • the useful signal and the useless signal can also be distinguished according to the loudness, and the user's voice is generally louder than the background sound. Therefore, the server can determine the first voice signal by the loudness The part that needs to be removed.
  • the step 203 may specifically include: calculating, by the server, the loudness of each sub-signal in the first voice signal of each channel according to a preset loudness algorithm.
  • the preset loudness algorithm may be set by the technician at the time of development, or may be adjusted by the user in the process of use, which is not limited by the embodiment of the present invention.
  • the preset loudness algorithm may specifically be a Zwicker sound measurement model, and may be other loudness algorithms. In the embodiment of the present invention, a Zwicker sound measurement model suitable for human voice is taken as an example for description.
  • the server acquires a first weight of each sub-signal in the first voice signal according to a loudness of each sub-signal of the first voice signal and a loudness sum of the same sub-signal of the multiple channels.
  • the server can determine the sub-signal of each sub-channel in all channels by step 204.
  • the same sub-signal refers to a sub-signal belonging to the same time period in the time dimension in the first speech signal of the plurality of channels.
  • the server adds the loudness of the same sub-signal of the first speech signal of the plurality of channels to obtain the loudness sum of the same sub-signal of the multiple channels.
  • the server divides the loudness of each sub-signal in the first voice signal with the loudness and the sum of the same sub-signals of the multiple channels to obtain a first weight of each sub-signal in the first voice signal.
  • the first speech signal of each channel includes 3 sub-signals, which are sub-signal 1, sub-signal 2, and sub-signal 3, respectively, and in channel 1
  • the first weight of the sub-signal 1 in the channel 1 is 1/3
  • the first weight of the sub-signal 2 is 3/8
  • the first weight of the sub-signal 3 is 4/11
  • the sub-signal 1 in the channel 2 The first weight of the sub-signal 2 is 5/8
  • the first weight of the sub-signal 3 is 7/11.
  • the server determines, according to the maximum value of the multiple first weights, a specified threshold.
  • the sub-signal with the first weight is usually a noise signal.
  • the server In order to filter out the sub-signal with the first weight, the server needs to determine the specified threshold according to the plurality of first weights.
  • the step 205 is specifically: the server obtains a maximum value of the plurality of first weights by comparing the first weight of each of the sub-signals in the first voice signal, and the voice signal weights that can be clearly distinguished according to the human ear hearing, the plurality of The maximum value of the first weight and the channel environment determine the specified threshold.
  • the specified threshold value may be 0.1 times or the like of the maximum value of the plurality of first weights, and the designated threshold value may be other representation manners, which is not limited in the embodiment of the present invention.
  • the server For a first voice signal of each channel, the server sets a second weight of the sub-signal whose first weight is less than the specified threshold to 0, according to the loudness and the predetermined loudness sum of each sub-signal in the first voice signal.
  • the first weight of the first voice signal is not less than a second weight of the sub-signal of the specified threshold.
  • the predetermined loudness sum refers to a sum of loudness of sub-signals other than the sub-signals in which the second weight has been set to 0 in the same sub-signal of the first speech signals of the plurality of channels.
  • the server sets the second weight of the sub-signal whose first weight is less than the specified threshold to 0, and calculates the sub-signal of the same sub-signal in the first speech signal of the plurality of channels except that the second weight has been set to 0.
  • the server uses the loudness of each sub-signal of the first speech signal and the predetermined loudness and division to obtain a second weight of the sub-signal of the first speech signal whose first weight is not less than a specified threshold.
  • the server sets the loudness of the sub-signal whose first weight is less than the specified threshold to 0, according to the loudness of each sub-signal in the first voice signal. And summing the loudness of the same sub-signal of the plurality of channels to obtain a second weight of each sub-signal in the first speech signal.
  • the second weight of the sub-signal whose first weight is less than the specified threshold is also 0.
  • the maximum value of the first weight of the two channels is 2/3, and if the specified threshold is 0.35, the first weight 1/3 of the sub-signal 1 in the channel 1 is less than the specified threshold, and the server will The second weight of the sub-signal 1 in channel 1 is set to zero.
  • the server first removes the loudness of the sub-signal 1 in the channel 1, and then calculates the loudness of the first sub-signal and equals the channel 2.
  • the server may set a signal identifier for each sub-signal in the first speech signal, and identify the signal of each sub-signal with the sub-signal.
  • the loudness corresponds to the storage.
  • the server performs the process of step 206, the server acquires the signal identifier of each sub-signal in the first voice signal, and obtains the loudness of the segment sub-signal from the stored loudness according to the signal identifier of the segment sub-signal.
  • the signal identification can be based on The label of the track number and the sub-signal are represented.
  • the signal identifier of the sub-signal 2 in the channel 1 can be represented as 12, and the signal identifier of the sub-signal 3 in the channel 2 can be represented as 23, etc., of course, the signal The identifier may also be represented by other means, which is not limited by the embodiment of the present invention.
  • the server acquires a third weight of each segment of the first voice signal according to a second weight of each of the first voice signals.
  • the second weight of the sub-signal can be processed by the second-order low-pass filtering module in the server.
  • the step 507 may specifically include: for the mth sub-signal of the first voice signal of each channel, the server according to the second weight of the mth sub-signal of the first speech signal and the third weight of the m-1th sub-signal, Smoothing the weight of the mth sub-signal in the first speech signal to obtain a third weight of the m-th sub-signal in the first speech signal, and the third weight of the m-th sub-signal is the m+1 in the channel
  • the third weight Iterating according to the above process, the third weight of each sub-signal in the first speech signal is obtained.
  • the smoothing process may be to neutralize the larger weight and the smaller weight to obtain an intermediate value, which may be obtained by an algorithm such as interpolation.
  • the process of obtaining the third weight of the first sub-signal according to the second weight of the first sub-signal of the server may be: the server according to the first sub-signal
  • the second weight and the preset initial value are smoothed by the weight of the first segment sub-signal to obtain a third weight of the first segment sub-signal.
  • the third weight of the first segment sub-signal is used as the third weight initial value of the second segment sub-signal
  • the third weight of the second segment sub-signal is obtained according to the second weight of the second-stage sub-signal.
  • the preset initial value may be set by the technician at the time of development, or may be adjusted by the user in the process of use, which is not limited by the embodiment of the present invention.
  • the first weight of the sub-signal 1 in channel 2 is 2/3
  • the second weight of sub-signal 1 in channel 2 is 1, and the second-order low-pass filtering in the server
  • the configuration parameters of the module may be 0.7 and 0.3, and the preset initial value is 0.6, and the weight of the sub-signal 1 in the channel 2 is smoothed according to the preset initial value of the sub-signal 1 in the channel 2 and the second weight, specifically
  • the preset initial value may be multiplied by 0.7, the second weight is multiplied by 0.3, and the two results are added, and the obtained result is taken as the third weight of the sub-signal 1 in the channel 2, and the third weight is 0.72. .
  • the child in channel 2 The third weight 0.72 of the signal 1 is used as the third weight initial value of the sub-signal 2 in the channel 2.
  • the server calculates the sub-signal in the channel 2 according to the second weight 5/8 of the sub-signal 2 in the channel 2.
  • the third weight of 2 is 0.6915, and the third weight 0.6915 of the sub-signal 2 in channel 2 is taken as the third weight initial value of the sub-signal 3 in the channel 2.
  • the manner in which the server smoothes the weight of each sub-signal in the first voice signal may be other than the above manner.
  • the method uses the method to smooth the weight of the sub-signal. Not limited.
  • the server adjusts a corresponding sub-signal of the original voice signal according to a third weight of each of the sub-signals in the first voice signal.
  • each sub-signal of the original speech signal is a digital speech signal
  • the third weight of the sub-signal is multiplied by the amplitude of the sub-signal in the original speech signal to obtain an adjusted sub-signal.
  • the amplitude may be used to indicate the frequency or signal strength of the original speech signal, which varies according to the sampling parameters used in the analog-to-digital conversion.
  • the amplitude may also be represented by other parameters, which is not in the embodiment of the present invention. Make a limit.
  • the first speech signal of each channel includes 3 sub-signals, which are sub-signal 1, sub-signal 2, and sub-signal 3, respectively.
  • the signal 2 contains 100 data, wherein the 51st data is 10, and if the third weight of the sub-signal 2 in the channel 1 is 0.2, the 51st data 10 of the sub-signal 2 in the channel 1 is multiplied by 0.2 to obtain an adjustment.
  • the 51st data of the subsequent sub-signal 2 is 2.
  • the server superimposes each of the adjusted sub-signals in the plurality of channels to obtain a processed speech signal.
  • the server superimposes the third weight adjusted original voice signals received in the same time period of the plurality of channels.
  • each sub-signal of multiple channels in the same time period is superimposed according to the receiving time, and the processed speech signal is obtained.
  • step 510 can also be performed:
  • the server processes the The speech signal is nonlinearly mapped to obtain an output speech signal.
  • the server determines, according to the amplitude of the processed voice signal, whether the amplitude of the processed voice signal is greater than a preset threshold, and when the amplitude of the processed voice signal is greater than a preset threshold, the server processes the The speech signal whose amplitude of the speech signal is greater than the preset threshold is mapped into the specified range, so that the maximum amplitude of the output speech signal does not exceed the range that can be represented by the digital domain.
  • the 16-bit representation of the digital domain can range from -32768 to 32767.
  • the preset threshold is 27000
  • the amplitude of the processed speech signal ranges from -40,000 to 40000
  • the server needs to range from -40,000 to -
  • the 27000 and 27000 ⁇ 40000 voice signals are nonlinearly mapped, and the server maps the voice signals to the designated area -32768 to 32767 according to a preset rule.
  • the speech signal of -40000 to -27000 is nonlinearly mapped to -32768 to -27000; the speech signal of 27,000 to 40,000 is nonlinearly mapped to 27000 to 32767.
  • the preset rule may be a certain function, and may be other methods, which are not limited by the embodiment of the present invention.
  • the preset threshold is not in the range that can be characterized by the digital domain.
  • the preset threshold may be set by the technician during the development, or may be adjusted by the user in the process of use, which is not limited by the embodiment of the present invention.
  • the embodiment of the present invention is described by taking an execution entity as a server as an example. Of course, the process can also be performed on the terminal device.
  • the method provided by the embodiment of the present invention by filtering a digital voice signal of a plurality of channels, removing a voice signal that does not include a normal voice of a person, obtaining a first voice signal of each channel, and according to each segment of the first voice signal
  • the loudness of the signal, the first speech signal of the plurality of channels is processed to obtain the processed speech signal, and the unwanted signal with low loudness in the speech signal is effectively removed, so that the processed speech noise is reduced, and the speech signal is recognized.
  • the degree is improved, which is convenient for the user to recognize the useful signal from the processed speech signal.
  • the original speech signal is superimposed according to the third weight, thereby greatly reducing the processing.
  • the noise signal contained in the voice signal and the recognition of the voice signal are greatly improved.
  • nonlinear mapping is performed on the processed speech signal to prevent the output speech signal from being broken.
  • FIG. 6 is a schematic structural diagram of a voice signal processing apparatus according to an embodiment of the present invention.
  • the apparatus includes: an original voice signal acquiring module 601, a filtering module 602, a loudness obtaining module 603, a weight acquiring module 604, and a voice signal processing module 605.
  • the original voice signal acquiring module 601 is configured to acquire original voice signals of multiple channels, where the original voice signal is a digital voice signal; the analog-to-digital conversion module 601 is connected to the filtering module 602, and the filtering module 602 is used for each The original voice signal of one channel is filtered to obtain a first voice signal of each channel, and the frequency of the first voice signal belongs to a preset frequency range; the filtering module 602 is connected to the loudness obtaining module 603, and the loudness obtaining module 603 is used by Acquiring the loudness of each sub-signal in the first voice signal for the first voice signal of each channel; the loudness obtaining module 603 is connected to the weight obtaining module 604, and the weight obtaining module 604 is configured to use the first voice signal according to the first voice signal.
  • the loudness of each sub-signal and the loudness sum of the same sub-signal of the plurality of channels acquire a first weight of each sub-signal in the first speech signal;
  • the weight acquisition module 604 is coupled to the speech signal processing module 605, the speech signal
  • the processing module 605 is configured to perform, according to the first weight of each sub-signal of the first voice signal of the multiple channels A first plurality of channels of the speech signal, the speech signal obtained after the treatment.
  • the voice signal processing module 605 includes:
  • a specified threshold determining unit configured to determine a specified threshold according to a maximum value of the first weight of the multiple channels
  • a weight obtaining unit configured to set, for each channel of the first voice signal, a second weight of the sub-signal with the first weight less than the specified threshold to 0, according to the loudness and the predetermined loudness of each sub-signal in the first voice signal And acquiring a second weight of the first signal in the first voice signal that is not less than the specified threshold; the predetermined sum of loudness refers to removing the second sub-signal of the first voice signal of the multiple channels The sum of the loudness of the sub-signals other than the sub-signal whose weight is set to zero.
  • the weight obtaining unit is further configured to acquire, for the first voice signal of each channel, a third weight of each of the sub-signals in the first voice signal according to a second weight of each of the first voice signals;
  • the voice signal processing module further includes: an adjusting unit, configured to adjust, according to the original voice signal of each channel, a corresponding sub-signal of the original voice signal according to a third weight of each of the sub-signals in the first voice signal;
  • the voice signal processing unit is configured to superimpose each of the adjusted sub-signals in the plurality of channels to obtain a processed voice signal.
  • the adjusting unit is further configured to: for each segment of the sub-signal, the third weight of the sub-signal The amplitude of the sub-signal in the original speech signal is multiplied to obtain an adjusted sub-signal.
  • the weight obtaining unit is further configured to: for each channel of the first voice signal, smooth weights of each sub-signal in the first voice signal according to a second weight of each segment of the first voice signal Processing, obtaining a third weight of each sub-signal in the first speech signal.
  • the device further includes:
  • the voice signal output module is configured to perform nonlinear mapping on the processed voice signal when the amplitude of the processed voice signal is greater than a preset threshold, to obtain an output voice signal.
  • the apparatus removes a voice signal that does not include a normal voice by removing a digital voice signal of a plurality of channels, and obtains a first voice signal of each channel, and according to the first
  • the loudness of each sub-signal in the speech signal is processed, and the first speech signal of the plurality of channels is processed to obtain a processed speech signal, which effectively removes unnecessary signals with low loudness in the speech signal, so that the processed speech noise is reduced.
  • the recognition of the voice signal is improved, and the user is convenient to recognize the useful signal from the processed voice signal.
  • the original speech signal is superimposed according to the third weight, thereby greatly reducing the processing.
  • the noise signal contained in the voice signal and the recognition of the voice signal are greatly improved.
  • nonlinear mapping is performed on the processed speech signal to prevent the output speech signal from being broken.
  • the voice signal processing apparatus provided in the foregoing embodiment is only illustrated by the division of the foregoing functional modules when processing the voice signal. In actual applications, the functions may be allocated by different functional modules as needed. Completion, that is, the internal structure of the server is divided into different functional modules to complete all or part of the functions described above.
  • the voice signal processing apparatus and the voice signal processing method embodiment are provided in the same concept, and the specific implementation process is described in detail in the method embodiment, and details are not described herein again.
  • the voice signal processing module 605 includes:
  • a first adjusting unit configured to multiply the first weight of the sub-signal and the amplitude of the sub-signal in the original speech signal for each of the first speech signals of the plurality of channels , get the adjusted sub-signal.
  • the first processing unit is configured to superimpose each of the adjusted sub-signals of the plurality of channels to obtain a processed speech signal.
  • the voice signal processing module 605 includes:
  • a specified threshold determining unit configured to determine a specified threshold according to a maximum value of the plurality of first weights
  • a second weighting unit configured to set a second weight of the sub-signal with the first weight less than the specified threshold to 0 for the first voice signal of each channel, according to the loudness of each sub-signal in the first voice signal And obtaining a second weight of the sub-signal in which the first weight is not less than the specified threshold in the first voice signal; wherein the predetermined loudness is the same as the first voice signal of the plurality of channels The sum of the loudness of the sub-signal other than the sub-signal having the second weight set to 0 is removed from the segment sub-signal;
  • a second adjusting unit configured to multiply the second weight of the sub-signal and the amplitude of the sub-signal in the original speech signal for each of the first speech signals of the plurality of channels , get the adjusted sub-signal.
  • the second processing unit is configured to superimpose each of the adjusted sub-signals of the plurality of channels to obtain a processed speech signal.
  • the voice signal processing module 605 includes:
  • a fourth weight unit configured to smooth, for each channel of the first voice signal, a weight of each sub-signal of the first voice signal according to a first weight of each of the first voice signals, Obtaining a fourth weight of each sub-signal in the first voice signal;
  • a fourth adjusting unit configured to: for the original voice signal of each channel, multiply the fourth weight of each sub-signal in the first voice signal by the amplitude of the sub-signal in the original voice signal to obtain an adjustment Sub-signal;
  • a fourth processing unit configured to superimpose each of the adjusted sub-signals of the plurality of channels to obtain a processed speech signal.
  • FIG. 7 is a schematic structural diagram of a server according to an embodiment of the present invention.
  • the server includes a processor 701 and a memory 702, which is coupled to the memory 702.
  • the processor 701 is configured to acquire original voice signals of multiple channels, where the original voice signals are digital voice signals;
  • the processor 701 is further configured to filter the original voice signal of each channel to obtain a first voice signal of each channel, where the frequency of the first voice signal belongs to a preset frequency range;
  • the processor 701 is further configured to acquire, for each channel of the first voice signal, a loudness of each of the sub-signals in the first voice signal;
  • the processor 701 is further configured to acquire, according to the loudness of each sub-signal in the first voice signal and the loudness sum of the same sub-signal of the multiple channels, a first weight of each sub-signal in the first voice signal;
  • the processor 701 is further configured to obtain the processed voice signal according to the first weight of each of the first voice signals of the plurality of channels and the first voice signal of the plurality of channels.
  • the processor 701 is further configured to determine a specified threshold according to a maximum value of the plurality of first weights
  • the processor 701 is further configured to, for each channel of the first voice signal, set a second weight of the sub-signal whose first weight is less than the specified threshold to 0, according to the loudness of each sub-signal in the first voice signal. Determining a second weight of the first signal in the first voice signal that is not less than the specified threshold; wherein the predetermined loudness sum refers to removing the same sub-signal from the first voice signal of the plurality of channels The sum of the loudness of the sub-signals other than the sub-signal of the second weight is set to zero.
  • the processor 701 is further configured to acquire, for the first voice signal of each channel, a third weight of each of the sub-signals in the first voice signal according to a second weight of each of the sub-signals in the first voice signal.
  • the processor 701 is further configured to, for each channel of the original voice signal, adjust a corresponding sub-signal of the original voice signal according to a third weight of each of the sub-signals in the first voice signal.
  • the processor 701 is further configured to superimpose each of the adjusted sub-signals of the plurality of channels to obtain a processed speech signal.
  • the processor 701 is further configured to, for each segment of the sub-signal, multiply the third weight of the sub-signal by the amplitude of the sub-signal in the original speech signal to obtain an adjusted sub-signal.
  • the processor 701 is further configured to, for each channel of the first voice signal, smooth the weight of each sub-signal in the first voice signal according to the second weight of each of the sub-signals in the first voice signal. Processing, obtaining a third weight of each sub-signal in the first speech signal.
  • the processor is further configured to: for each segment of the first voice signal of the multiple channels, the sub-signal The first weight is multiplied by the amplitude of the sub-signal in the original speech signal to obtain an adjusted sub-signal.
  • the processor is further configured to superimpose each of the adjusted sub-signals of the plurality of channels to obtain a processed speech signal.
  • the processor is further configured to determine a specified threshold according to a maximum value of the plurality of first weights
  • the processor is further configured to, for each channel of the first voice signal, set a second weight of the sub-signal whose first weight is less than the specified threshold to 0, according to each sub-signal of the first voice signal. Acquiring a second weight of the first signal in the first voice signal that is not less than the specified threshold by the loudness and the predetermined loudness sum; wherein the predetermined loudness sum is the first voice signal of the plurality of channels The loudness sum of the sub-signals other than the sub-signal having the second weight set to 0 is removed from the same sub-signal;
  • the processor is further configured to: for each of the first speech signals of the plurality of channels, the second weight of the sub-signal and the amplitude of the sub-signal in the original speech signal Multiply, get the adjusted sub-signal.
  • the processor is further configured to superimpose each of the adjusted sub-signals of the plurality of channels to obtain a processed speech signal.
  • the processor is further configured to: according to the first voice signal of each channel, according to the first of each sub-signal in the first voice signal Weighting, smoothing weights of each sub-signal in the first speech signal, and obtaining a fourth weight of each sub-signal in the first speech signal;
  • the processor is further configured to: for the original voice signal of each channel, multiply the fourth weight of each sub-signal in the first voice signal by the amplitude of the sub-signal in the original voice signal to obtain Adjusted sub-signal;
  • the processor is further configured to superimpose each of the adjusted sub-signals of the plurality of channels to obtain a processed speech signal.
  • the processor 701 is further configured to perform nonlinear mapping on the processed voice signal when the amplitude of the processed voice signal is greater than a preset threshold, to obtain an output voice signal.
  • the completion of the hardware may also be performed by a program to instruct related hardware.
  • the program may be stored in a computer readable storage medium.
  • the storage medium mentioned above may be a read only memory, a magnetic disk or an optical disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Telephone Function (AREA)
  • Telephonic Communication Services (AREA)

Abstract

提供了一种语音信号处理方法、装置及服务器,属于通信技术领域。所述方法包括:获取多个通道的原始语音信号,该原始语音信号为数字语音信号(101);对每一个通道的原始语音信号进行滤波,得到每一个通道的第一语音信号,该第一语音信号的频率属于预设频率范围(102);对于每一个通道的第一语音信号,获取该第一语音信号中每段子信号的响度(103);根据该第一语音信号中每段子信号的响度以及该多个通道的同一段子信号的响度和,获取该第一语音信号中每段子信号的第一权重(104);按照该多个通道的第一语音信号中每段子信号的第一权重和该多个通道的第一语音信号,得到处理后的语音信号(105)。该语音信号处理方法通过信号的权重对信号进行处理,提高了语音辨识度。

Description

语音信号处理方法、装置及服务器
本申请要求于2013年12月13日提交中国专利局、申请号为201310681217.9、发明名称为“语音信号处理方法、装置及服务器”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明实施例涉及通信技术领域,特别涉及一种语音信号处理方法、装置及服务器。
背景技术
随着通信技术的日益发展,在即时通讯应用中常常会遇到多个用户同时进行语音通话的情况,而在进行多方语音通信时,需要将来自多个通道的语音信号进行混音处理。
在对多个通道的语音信号进行混音处理时,一般直接将多个通道的语音信号进行简单叠加。
在对多个通道的语音信号进行直接叠加时,语音信号中所包含的无用信号也被叠加起来,使得叠加后的语音噪声较大,造成语音信号的辨识度较低,用户很难从叠加后的语音中进行辨识。
发明内容
为了解决在对多个通道的语音信号进行直接叠加时,语音信号中所包含的无用信号也被叠加起来,使得叠加后的语音噪声较大,造成语音信号的辨识度较低的问题,本发明实施例提供了一种语音信号处理方法、装置及服务器。所述技术方案如下:
第一方面,提供了一种语音信号处理方法,所述方法包括:
获取多个通道的原始语音信号,所述原始语音信号为数字语音信号;
对每一个通道的原始语音信号进行滤波,得到每一个通道的第一语音信号,所述第一语音信号的频率属于预设频率范围;
对于每一个通道的第一语音信号,获取所述第一语音信号中每段子信号的 响度;
根据所述第一语音信号中每段子信号的响度以及所述多个通道的同一段子信号的响度和,获取所述第一语音信号中每段子信号的第一权重;
按照所述多个通道的第一语音信号中每段子信号的第一权重和所述多个通道的第一语音信号,得到处理后的语音信号。
可选地,按照所述多个通道的第一语音信号中每段子信号的第一权重和所述多个通道的第一语音信号,得到处理后的语音信号,包括:
根据所述多个第一权重的最大值,确定指定阈值;
对于每一个通道的第一语音信号,将第一权重小于所述指定阈值的子信号的第二权重设置为0,根据所述第一语音信号中每段子信号的响度和预定响度和,获取所述第一语音信号中第一权重不小于所述指定阈值的子信号的第二权重;其中,所述预定响度和是所述多个通道的第一语音信号中同一段子信号中除去已经将第二权重设置为0的子信号以外的子信号的响度和;
对于每一个通道的第一语音信号,根据所述第一语音信号中每段子信号的第二权重,获取所述第一语音信号中每段子信号的第三权重;
对于每一个通道的原始语音信号,根据所述第一语音信号中每段子信号的第三权重调整所述原始语音信号中对应的子信号;
将所述多个通道中调整后的每段子信号对应叠加,得到处理后的语音信号。
可选地,对于每一个通道的原始语音信号,根据所述第一语音信号中每段子信号的第三权重调整所述原始语音信号中对应的子信号,包括:
对于每一段子信号,将所述子信号的第三权重与所述原始语音信号中所述子信号的幅值相乘,得到调整后的子信号。
可选地,根据所述第一语音信号中每段子信号的第二权重,获取所述第一语音信号中每段子信号的第三权重包括:
对于每一个通道的第一语音信号,根据所述第一语音信号中每段子信号的第二权重,对所述第一语音信号中每段子信号的权重进行平滑处理,得到所述第一语音信号中每段子信号的第三权重。
可选地,所述按照所述多个通道的第一语音信号中每段子信号的第一权重和所述多个通道的第一语音信号,得到处理后的语音信号,包括:
对于所述多个通道的第一语音信号中的每一段子信号,将所述子信号的第 一权重与所述原始语音信号中所述子信号的幅值相乘,得到调整后的子信号。
将所述多个通道中调整后的每段子信号对应叠加,得到处理后的语音信号。
可选地,所述按照所述多个通道的第一语音信号中每段子信号的第一权重和所述多个通道的第一语音信号,得到处理后的语音信号,包括:
根据所述多个第一权重的最大值,确定指定阈值;
对于每一个通道的第一语音信号,将第一权重小于所述指定阈值的子信号的第二权重设置为0,根据所述第一语音信号中每段子信号的响度以及预定响度和,获取所述第一语音信号中第一权重不小于所述指定阈值的子信号的第二权重;其中,所述预定响度和是所述多个通道的第一语音信号中同一段子信号中除去已经将第二权重设置为0的子信号以外的子信号的响度和;
对于所述多个通道的第一语音信号中的每一段子信号,将所述子信号的第二权重与所述原始语音信号中所述子信号的幅值相乘,得到调整后的子信号。
将所述多个通道中调整后的每段子信号对应叠加,得到处理后的语音信号。
可选地,所述按照所述多个通道的第一语音信号中每段子信号的第一权重和所述多个通道的第一语音信号,得到处理后的语音信号,包括:
对于每一个通道的第一语音信号,根据所述第一语音信号中每段子信号的第一权重,对所述第一语音信号中每段子信号的权重进行平滑处理,得到所述第一语音信号中每段子信号的第四权重;
对于每一个通道的原始语音信号,根据所述第一语音信号中每段子信号的第四权重与所述原始语音信号中所述子信号的幅值相乘,得到调整后的子信号;
将所述多个通道中调整后的每段子信号对应叠加,得到处理后的语音信号。
可选地,将所述多个通道中调整后的每段子信号叠加,得到处理后的语音信号之后,所述方法还包括:
当所述处理后的语音信号的幅值大于预设阈值时,对所述处理后的语音信号进行非线性映射,得到输出语音信号。
第二方面,提供了一种语音信号处理装置,所述装置包括:
原始语音信号获取模块,用于获取多个通道的原始语音信号,所述原始语音信号为数字语音信号;
滤波模块,用于对每一个通道的原始语音信号进行滤波,得到每一个通道的第一语音信号,所述第一语音信号的频率属于预设频率范围;
响度获取模块,用于对于每一个通道的第一语音信号,获取所述第一语音信号中每段子信号的响度;
权重获取模块,用于根据所述第一语音信号中每段子信号的响度以及所述多个通道的同一段子信号的响度和,获取所述第一语音信号中每段子信号的第一权重;
语音信号处理模块,用于按照所述多个通道的第一语音信号中每段子信号的第一权重和所述多个通道的第一语音信号,得到处理后的语音信号。
可选地,所述语音信号处理模块包括:
指定阈值确定单元,用于根据所述多个通道的第一权重的最大值,确定指定阈值;
权重获取单元,用于对于每一个通道的第一语音信号,将第一权重小于所述指定阈值的子信号的第二权重设置为0,根据所述第一语音信号中每段子信号的响度和所述多个通道的第一语音信号中同一段子信号中除已将第二权重设置为0的子信号以外子信号的响度和,获取所述第一语音信号中第一权重不小于所述指定阈值的子信号的第二权重;
所述权重获取单元还用于对于每一个通道的第一语音信号,根据所述第一语音信号中每段子信号的第二权重,获取所述第一语音信号中每段子信号的第三权重;
所述语音信号处理模块还包括:调整单元,用于对于每一个通道的原始语音信号,根据所述第一语音信号中每段子信号的第三权重调整所述原始语音信号中对应的子信号;
语音信号处理单元,用于将所述多个通道中调整后的每段子信号对应叠加,得到处理后的语音信号。
可选地,所述调整单元还用于对于每一段子信号,将所述子信号的第三权重与所述原始语音信号中所述子信号的幅值相乘,得到调整后的子信号。
可选地,所述权重获取单元还用于对于每一个通道的第一语音信号,根据所述第一语音信号中每段子信号的第二权重,对所述第一语音信号中每段子信 号的权重进行平滑处理,得到所述第一语音信号中每段子信号的第三权重。
可选地,所述语音信号处理模块,包括:
第一调整单元,用于对于所述多个通道的第一语音信号中的每一段子信号,将所述子信号的第一权重与所述原始语音信号中所述子信号的幅值相乘,得到调整后的子信号。
第一处理单元,用于将所述多个通道中调整后的每段子信号对应叠加,得到处理后的语音信号。
可选地,所述语音信号处理模块,包括:
指定阈值确定单元,用于根据所述多个第一权重的最大值,确定指定阈值;
第二权重单元,用于对于每一个通道的第一语音信号,将第一权重小于所述指定阈值的子信号的第二权重设置为0,根据所述第一语音信号中每段子信号的响度以及预定响度和,获取所述第一语音信号中第一权重不小于所述指定阈值的子信号的第二权重;其中,所述预定响度和是所述多个通道的第一语音信号中同一段子信号中除去已经将第二权重设置为0的子信号以外的子信号的响度和;
第二调整单元,用于对于所述多个通道的第一语音信号中的每一段子信号,将所述子信号的第二权重与所述原始语音信号中所述子信号的幅值相乘,得到调整后的子信号。
第二处理单元,用于将所述多个通道中调整后的每段子信号对应叠加,得到处理后的语音信号。
可选地,所述语音信号处理模块,包括:
第四权重单元,用于对于每一个通道的第一语音信号,根据所述第一语音信号中每段子信号的第一权重,对所述第一语音信号中每段子信号的权重进行平滑处理,得到所述第一语音信号中每段子信号的第四权重;
第四调整单元,用于对于每一个通道的原始语音信号,根据所述第一语音信号中每段子信号的第四权重与所述原始语音信号中所述子信号的幅值相乘,得到调整后的子信号;
第四处理单元,用于将所述多个通道中调整后的每段子信号对应叠加,得到处理后的语音信号。
可选地,所述装置还包括:
语音信号输出模块,用于当所述处理后的语音信号的幅值大于预设阈值 时,对所述处理后的语音信号进行非线性映射,得到输出语音信号。
第三方面,提供了一种服务器,所述服务器包括:处理器和存储器,所述处理器与所述存储器相连接,
所述处理器,用于获取多个通道的原始语音信号,所述原始语音信号为数字语音信号;
所述处理器,还用于对每一个通道的原始语音信号进行滤波,得到每一个通道的第一语音信号,所述第一语音信号的频率属于预设频率范围;
所述处理器,还用于对于每一个通道的第一语音信号,获取所述第一语音信号中每段子信号的响度;
所述处理器,还用于根据所述第一语音信号中每段子信号的响度以及所述多个通道的同一段子信号的响度和,获取所述第一语音信号中每段子信号的第一权重;
所述处理器,还用于按照所述多个通道的第一语音信号中每段子信号的第一权重和所述多个通道的第一语音信号,得到处理后的语音信号。
可选地,所述处理器,还用于根据所述多个第一权重的最大值,确定指定阈值;
所述处理器,还用于对于每一个通道的第一语音信号,将第一权重小于所述指定阈值的子信号的第二权重设置为0,根据所述第一语音信号中每段子信号的响度和预定响度和,获取所述第一语音信号中第一权重不小于所述指定阈值的子信号的第二权重;其中,所述预定响度和是所述多个通道的第一语音信号中同一段子信号中除去已经将第二权重设置为0的子信号以外的子信号的响度和;
所述处理器,还用于对于每一个通道的第一语音信号,根据所述第一语音信号中每段子信号的第二权重,获取所述第一语音信号中每段子信号的第三权重;
所述处理器,还用于对于每一个通道的原始语音信号,根据所述第一语音信号中每段子信号的第三权重调整所述原始语音信号中对应的子信号;
所述处理器,还用于将所述多个通道中调整后的每段子信号对应叠加,得到处理后的语音信号。
可选地,所述处理器,还用于对于每一段子信号,将所述子信号的第三权 重与所述原始语音信号中所述子信号的幅值相乘,得到调整后的子信号。
可选地,所述处理器,还用于对于每一个通道的第一语音信号,根据所述第一语音信号中每段子信号的第二权重,对所述第一语音信号中每段子信号的权重进行平滑处理,得到所述第一语音信号中每段子信号的第三权重。
可选地,所述处理器,还用于对于所述多个通道的第一语音信号中的每一段子信号,将所述子信号的第一权重与所述原始语音信号中所述子信号的幅值相乘,得到调整后的子信号。
所述处理器,还用于将所述多个通道中调整后的每段子信号对应叠加,得到处理后的语音信号。
可选地,所述处理器,还用于根据所述多个第一权重的最大值,确定指定阈值;
所述处理器,还用于对于每一个通道的第一语音信号,将第一权重小于所述指定阈值的子信号的第二权重设置为0,根据所述第一语音信号中每段子信号的响度以及预定响度和,获取所述第一语音信号中第一权重不小于所述指定阈值的子信号的第二权重;其中,所述预定响度和是所述多个通道的第一语音信号中同一段子信号中除去已经将第二权重设置为0的子信号以外的子信号的响度和;
所述处理器,还用于对于所述多个通道的第一语音信号中的每一段子信号,将所述子信号的第二权重与所述原始语音信号中所述子信号的幅值相乘,得到调整后的子信号。
所述处理器,还用于将所述多个通道中调整后的每段子信号对应叠加,得到处理后的语音信号。
可选地,所述按照所述多个通道的第一语音信号中每段子信号的第一权重和所述多个通道的第一语音信号,得到处理后的语音信号,包括:
所述处理器,还用于对于每一个通道的第一语音信号,根据所述第一语音信号中每段子信号的第一权重,对所述第一语音信号中每段子信号的权重进行平滑处理,得到所述第一语音信号中每段子信号的第四权重;
所述处理器,还用于对于每一个通道的原始语音信号,根据所述第一语音信号中每段子信号的第四权重与所述原始语音信号中所述子信号的幅值相乘,得到调整后的子信号;
所述处理器,还用于将所述多个通道中调整后的每段子信号对应叠加,得 到处理后的语音信号。
可选地,所述处理器,还用于当所述处理后的语音信号的幅值大于预设阈值时,对所述处理后的语音信号进行非线性映射,得到输出语音信号。
本发明实施例提供的技术方案带来的有益效果是:
通过对多个通道的数字语音信号进行滤波,去掉不包含人正常发声的语音信号,得到每一个通道的第一语音信号,并根据该第一语音信号中每段子信号的响度,对该多个通道的第一语音信号进行处理,得到处理后的语音信号,有效的去除了语音信号中响度较低的无用信号,使得处理后的语音噪声减少,语音信号的辨识度提高,便于用户从处理后的语音信号中辨识有用信号。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本发明实施例提供的一种语音信号处理方法的流程图;
图2是本发明实施例提供的另一种语音信号处理方法的流程图;
图3是本发明实施例提供的另一种语音信号处理方法的流程图;
图4是本发明实施例提供的另一种语音信号处理方法的流程图;
图5是本发明实施例提供的另一种语音信号处理方法的流程图;
图6是本发明实施例提供的一种语音信号处理装置的结构示意图;
图7是本发明实施例提供的一种服务器的结构示意图。
具体实施方式
为使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明实施方式作进一步地详细描述。
图1是本发明实施例提供的一种语音信号处理方法的流程图。参见图1,本实施例以执行主体为服务器为例来举例说明,该方法包括:
101、获取多个通道的原始语音信号,该原始语音信号为数字语音信号。
102、对每一个通道的原始语音信号进行滤波,得到每一个通道的第一语音信号,该第一语音信号的频率属于预设频率范围。
103、对于每一个通道的第一语音信号,获取该第一语音信号中每段子信号的响度。
104、根据该第一语音信号中每段子信号的响度以及该多个通道的同一段子信号的响度和,获取该第一语音信号中每段子信号的第一权重。
105、按照该多个通道的第一语音信号中每段子信号的第一权重和该多个通道的第一语音信号,得到处理后的语音信号。
本发明实施例提供的方法,通过对多个通道的数字语音信号进行滤波,去掉不包含人正常发声的语音信号,得到每一个通道的第一语音信号,并根据该第一语音信号中每段子信号的响度,对该多个通道的第一语音信号进行处理,得到处理后的语音信号,有效的去除了语音信号中响度较低的无用信号,使得处理后的语音噪声减少,语音信号的辨识度提高,便于用于从处理后的语音信号中辨识有用信号。
图2是本发明实施例提供的另一种语音信号处理方法的流程图。参见图2,本实施例以执行主体为服务器为例来举例说明,该方法包括:
201、获取多个通道的原始语音信号,该原始语音信号为数字语音信号。
以服务器是即时通讯应用的服务器为例,当用户通过即时通讯应用与多个联系人进行语音通信,或用户在即时通信应用的群组中进行语音通信时,服务器在同一时间段内可能会接收到来自多个用户的语音信号,服务器以每个用户的语音信号作为一个通道的原始语音信号。
服务器接收多个通道发送的原始语音信号,该原始语音信号的传输为一帧一帧进行,也即原始语音信号中包括多个在时间上连续的帧。为了便于描述和理解,本发明实施例的后续步骤中仅以子信号来代替帧。其中,原始语音信号为数字语音信号。
202、对每一个通道的原始语音信号进行滤波,得到每一个通道的第一语音信号,该第一语音信号的频率属于预设频率范围。
由于该多个通道的数字语音信号中不仅包含用户需要的语音信号,还包含大量无用信号,如噪声等。服务器需要从该多个通道的原始语音信号中过滤出有用信号,该有用信号可以是属于人正常发声的频率范围内的语音信号。
其中,预设频率范围可以由技术人员在开发时设置,也可以由用户在使用的过程中调整,本发明实施例对此不做限定。该预设频率范围具体可以为 100Hz~4KHz,也可以为其他频率范围。
203、对于每一个通道的第一语音信号,获取该第一语音信号中每段子信号的响度。
进一步地,服务器还可以根据响度来区分第一语音信号中的有用信号和无用信号,用户的声音一般要比背景音的响度大。
服务器可以根据预设响度算法,计算每一个通道的第一语音信号中每段子信号的响度。其中,预设响度算法可以由技术人员在开发时设置,也可以由用户在使用的过程中调整,本发明实施例对此不做限定。该预设响度算法具体可以为Zwicker响度量测模型,当然也可以是其它响度算法。
204、根据该第一语音信号中每段子信号的响度以及该多个通道的同一段子信号的响度和,获取该第一语音信号中每段子信号的第一权重。
一个子信号的响度在同一段子信号的响度和中所占的比例可以直接影响到该子信号在叠加后的语音信号中的辨识度,因此,服务器可以通过步骤204确定所有通道中每段子信号的第一权重。同一段子信号是指多个通道的第一语音信号中,在时间维度上属于同一时间片的子信号。
具体地,服务器将多个通道的第一语音信号中的同一段子信号的响度进行相加,得到该多个通道的同一段子信号的响度和。
可选地,服务器将该第一语音信号中每段子信号的响度与该多个通道的同一段子信号的响度和相除,得到该第一语音信号中每段子信号的第一权重。
例如,如果接收语音信号的通道数为2,分别为通道1和通道2,每一个通道的第一语音信号包括3段子信号,分别为子信号1、子信号2和子信号3,且通道1中的子信号1的响度为1、子信号2的响度为3、子信号3的响度为4,通道2中的子信号1的响度为2、子信号2的响度为5、子信号3的响度为7,则两个通道的第一语音信号中第一段子信号的响度和为1+2=3、第二段子信号的响度和为3+5=8、第三段子信号的响度和为4+7=11。
对应地,通道1中的子信号1的第一权重为1/3、子信号2的第一权重为3/8、子信号3的第一权重为4/11,通道2中的子信号1的第一权重为2/3,子信号2的第一权重为5/8,子信号3的第一权重为7/11。也即,如下表所示:
  子信号1 子信号2 子信号3
通道1(响度) 1 3 4
通道2(响度) 2 5 7
同一段子信号的响度和 3 8 11
通道1(第一权重) 1/3 3/8 4/11
通道2(第一权重) 2/3 5/8 7/11
205、对于多个通道的第一语音信号中的每一段子信号,将子信号的第一权重与原始语音信号中子信号的幅值相乘,得到调整后的子信号。
其中,幅值可以用于表示原始语音信号的频率或信号强度,根据模数转换时所采用的采样参数不同而变化。当然,该幅值还可以由其它参数表示,本发明实施例对此不做限定。
比如,将通道1中的子信号1的幅值与第一权重1/3相乘,得到调整后的子信号11;又比如,将通道2中的子信号的幅值与第一权重2/3相乘,得到调整后的子信号21。
206、将多个通道中调整后的每段子信号对应叠加,得到处理后的语音信号。
对于调整后的每段子信号,服务器将多个通道中属于同一时间片的子信号对应叠加,得到处理后的语音信号。
比如,将调整后的子信号11与调整后的子信号21相加,得到处理后的语音信号的子信号1。
本发明实施例提供的方法,通过对多个通道的数字语音信号进行滤波,去掉不包含人正常发声的语音信号,得到每一个通道的第一语音信号,并根据该第一语音信号中每段子信号的响度,对该多个通道的第一语音信号进行处理,得到处理后的语音信号,有效的去除了语音信号中响度较低的无用信号,使得处理后的语音噪声减少,语音信号的辨识度提高,便于用于从处理后的语音信号中辨识有用信号。
进一步地,本发明实施例提供的方法,通过根据响度来得到第一权重,并根据第一权重来调整原始语音信号后叠加,得到处理后的语音信号;能够有效地减少语音信号中响度较低的无用信号,使得处理后的语音信号中的噪声减少。
作为可选的实现方式,在图2所示实施例的基础上,还可以将响度低于指定阈值的子信号完全去掉。换句话说,作为步骤205和步骤206的一种可替代实现方式,如图3所示,该语音信号处理方法可以包括:
205a、根据多个第一权重的最大值,确定指定阈值。
第一权重较小的子信号通常为噪声信号,为了过滤掉第一权重较小的子信号,服务器需要根据多个第一权重来确定指定阈值。
比如,该指定阈值可以为该多个第一权重的最大值的0.1倍等,当然该指定阈值也可以是其他表示方式,本发明实施例对此不做限定。
206a、对于每一个通道的第一语音信号,将第一权重小于指定阈值的子信号的第二权重设置为0,根据第一语音信号中每段子信号的响度以及预定响度和,获取第一语音信号中第一权重不小于指定阈值的子信号的第二权重。
其中,预定响度和是多个通道的第一语音信号中同一段子信号中除去已经将第二权重设置为0的子信号以外的子信号的响度和。
具体地,服务器将第一权重小于指定阈值的子信号的第二权重设置为0,并计算多个通道的第一语音信号中的同一段子信号中除去已经将第二权重设置为0的子信号以外的子信号的响度和作为预定响度和。
服务器可以利用第一语音信号中每段子信号的响度与预定响度和相除,得到该第一语音信号中第一权重不小于指定阈值的子信号的第二权重。
比如,基于步骤204的示例,两个通道的第一权重的最大值为2/3,如果指定阈值为0.35,则通道1中的子信号1的第一权重1/3小于指定阈值,服务器将通道1中的子信号1的第二权重设置为0。
又比如,通道2中的子信号1的第一权重为2/3大于指定阈值,则服务器先将通道1中的子信号1的响度去除,然后计算第一段子信号的响度和等于通道2中的子信号1的响度2,再计算得到通道2中的子信号1的第二权重为2/2=1。
207a、对于多个通道的第一语音信号中的每一段子信号,将子信号的第二权重与原始语音信号中该子信号的幅值相乘,得到调整后的子信号。
比如,将通道1中的子信号1的幅值与第二权重0相乘,得到调整后的子信号11;又比如,将通道2中的子信号1的幅值与第二权重1相乘,得到调整后的子信号21。
208a、将多个通道中调整后的每段子信号对应叠加,得到处理后的语音信号。
对于调整后的每段子信号,服务器将多个通道中属于同一时间片的子信号对应叠加,得到处理后的语音信号。
比如,将调整后的子信号11与调整后的子信号21相加,得到处理后的语音信号的子信号1。
综上所述,本实施例提供的语音信号处理方法,通过计算每段子信号的第二权重,对第一权重小于指定阈值的子信号完全去除,更进一步地减少语音信号中响度较低的无用信号,使得处理后的语音信号中的噪声减少。
作为可选的实现方式,在图2所示实施例的基础上,还可以将第一权重进行平滑处理,避免处理后的声音信号出现声音忽大忽小的现象。换句话说,作为步骤205和步骤206的一种可替代实现方式,如图4所示,该语音信号处理方法可以包括:
205b,对于每一个通道的第一语音信号,根据第一语音信号中每段子信号的第一权重,对第一语音信号中每段子信号的权重进行平滑处理,得到第一语音信号中每段子信号的第四权重。
为了均衡子信号的声音效果,对于一段子信号来说,当获取到该段子信号的第一权重时,可以通过服务器中的二阶低通滤波模块对该段子信号的第一权重进行平滑处理。
该步骤205b可以包括:对于每一个通道的第一语音信号中第m段子信号,服务器根据该第一语音信号中第m段子信号的第一权重和第m-1段子信号的第二权重,对该第一语音信号中第m段子信号的权重进行平滑处理,得到该第一语音信号中第m段子信号的第四权重。另外,服务器还将该第m段子信号的第四权重作为该通道中第m+1段子信号的第四权重初值,并根据该第一语音信号中第m+1段子信号的第一权重,对该第m+1段子信号的权重进行平滑处理,得到该第m+1段子信号的第四权重。依据上述过程进行迭代,得到该第一语音信号中每段子信号的第四权重。
其中,平滑处理可以是将较大的权重和较小的权重中和,得到一个中间值,该中间值可以通过插值等算法获取。
需要说明的是,对于每一个通道的第1段子信号,服务器根据该第1段子信号的第一权重,获取该第1段子信号的第四权重的过程可以为:服务器根据该第1段子信号的第一权重和预设初值,对该第1段子信号的权重进行平滑处理,得到该第1段子信号的第四权重。相应地,该第1段子信号的第四权重作为第2段子信号的第四权重初值,并根据第2段子信号的第一权重,以此获取 该第2段子信号的第四权重。该预设初值可以由技术人员在开发时设置,也可以由用户在使用的过程中调整,本发明实施例对此不做限定。
基于步骤204的示例,通道2中的子信号1的第一权重为2/3,服务器中的二阶低通滤波模块的配置参数可以为0.7和0.3,预设初值为0.6,则根据通道2中的子信号1的第一权重和该预设初值对通道2中的子信号1的权重进行平滑处理,具体可以为:首先,服务器将该预设初值乘以0.7,该第一权重乘以0.3,并将两个结果相加,将得到的结果作为通道2中的子信号1的第四权重,该第四权重为0.62。然后,服务器将该通道2中的子信号1的第四权重0.62作为该通道2中的子信号2的第四权重初值,服务器根据该通道2中的子信号2的第一权重5/8,计算得到该通道2中的子信号2的第四权重为0.62*0.7+5/8*0.3=0.6215。最后,服务器将通道2中的子信号2的第四权重0.6215作为该通道2中的子信号3的第四权重初值,通过上述过程,得到该通道2中的子信号3的第四权重。
206b,对于每一个通道的原始语音信号,根据第一语音信号中每段子信号的第四权重与原始语音信号中该子信号的幅值相乘,得到调整后的子信号。
比如,将通道2中的子信号1的幅值与第四权重0.62相乘,得到调整后的子信号21;又比如,将通道2中的子信号2的幅值与第四权重0.6215相乘,得到调整后的子信号22。
207b,将多个通道中调整后的每段子信号对应叠加,得到处理后的语音信号。
对于调整后的每段子信号,服务器将多个通道中属于同一时间片的子信号对应叠加,得到处理后的语音信号。
综上所述,本实施例提供的语音信号处理方法,通过对第一权重平滑后,得到每段子信号的第四权重,可以避免处理后的语音信号中出现声音忽大忽小的现象。
上述几种实施例还可以综合实现成为图5所示的实施例。
图5是本发明实施例提供的一种语音信号处理方法的流程图。参见图5,本实施例以执行主体为服务器为例来举例说明,该方法包括:
501、服务器获取多个通道的原始语音信号,该原始语音信号为数字语音信号。
该服务器用于对多个通道的原始语音信号进行处理,该服务器可以为即时通讯应用的服务器、会议服务器等。
以即时通讯应用的服务器为例,当用户通过即时通讯应用与多个联系人进行语音通信,或用户在即时通信应用的群组中进行语音通信时,服务器在同一时间段内可能会接收到来自多个用户的语音信号,以每个用户的语音信号作为一个通道的原始语音信号,为了得到最终的输出语音信号,服务器需要将多个通道的原始语音信号进行步骤501至步骤511所示出的叠加过程。
服务器接收多个通道发送的原始语音信号,该原始语音信号的传输为一帧一帧进行,也即原始语音信号中包括多个在时间上连续的帧。为了便于描述和理解,本发明实施例的后续步骤中仅以子信号来代替帧。其中,原始语音信号为数字语音信号。
502、服务器对每一个通道的原始语音信号进行滤波,得到每一个通道的第一语音信号,该第一语音信号的频率属于预设频率范围。
由于该多个通道的数字语音信号中不仅包含用户需要的语音信号,还包含大量无用信号,如噪声等,为了简化后续的语音处理过程,服务器需要从该多个通道的原始语音信号中过滤出有用信号,该有用信号可以是属于人正常发声的频率范围内的语音信号。
该步骤502可以具体包括:服务器根据预设频率范围,对每一个通道中的数字信号进行滤波,过滤掉频率不在预设频率范围内的数字语音信号,得到处于预设频率范围内的数字语音信号,服务器将该处于预设频率范围内的数字语音信号作为第一语音信号。
其中,预设频率范围可以由技术人员在开发时设置,也可以由用户在使用的过程中调整,本发明实施例对此不做限定。该预设频率范围具体可以为100Hz~4KHz,也可以为其他频率范围。而且,本发明实施例中是以人正常发声时的声音频率来确定预设频率范围来举例说明,当然,还可以是以其他声音的频率来确定该预设频率范围,本发明实施例对如何确定预设频率范围不做限定。
503、服务器对于每一个通道的第一语音信号,获取该第一语音信号中每段子信号的响度。
在语音通信过程中,还可以根据响度区分有用信号和无用信号,用户的声音一般要比背景音的响度大。因此,服务器可以通过响度确定第一语音信号中 需要去除的部分。
该步骤203可以具体包括:服务器根据预设响度算法,计算每一个通道的第一语音信号中每段子信号的响度。其中,预设响度算法可以由技术人员在开发时设置,也可以由用户在使用的过程中调整,本发明实施例对此不做限定。该预设响度算法具体可以为Zwicker响度量测模型,当然也可以是其它响度算法,本发明实施例中以适用于人声的Zwicker响度量测模型为例来进行说明。
504、服务器根据该第一语音信号中每段子信号的响度以及该多个通道的同一段子信号的响度和,获取该第一语音信号中每段子信号的第一权重。
子信号的响度在同一段子信号的响度和中所占的比例可以直接影响到该子信号在叠加后的语音信号中的辨识度,因此,服务器可以通过步骤204确定所有通道中每段子信号的第一权重。同一段子信号是指多个通道的第一语音信号中,在时间维度上属于同一时间段的子信号。
具体地,服务器将多个通道的第一语音信号中的同一段子信号的响度进行相加,得到该多个通道的同一段子信号的响度和。
可选地,服务器将该第一语音信号中每段子信号的响度与该多个通道的同一段子信号的响度和相除,得到该第一语音信号中每段子信号的第一权重。
例如,如果接收语音信号的通道数为2,分别为通道1和通道2,每一个通道的第一语音信号包括3段子信号,分别为子信号1、子信号2和子信号3,且通道1中的子信号1的响度为1、子信号2的响度为3、子信号3的响度为4,通道2中的子信号1的响度为2、子信号2的响度为5、子信号3的响度为7,则两个通道的第一语音信号中第一段子信号的响度和为1+2=3、第二段子信号的响度和为3+5=8、第三段子信号的响度和为4+7=11。对应地,通道1中的子信号1的第一权重为1/3、子信号2的第一权重为3/8、子信号3的第一权重为4/11,通道2中的子信号1的第一权重为2/3,子信号2的第一权重为5/8,子信号3的第一权重为7/11。
505、服务器根据该多个第一权重的最大值,确定指定阈值。
第一权重较小的子信号通常为噪声信号,为了过滤掉第一权重较小的子信号,服务器需要根据多个第一权重来确定指定阈值。
该步骤205具体为:服务器通过对比该第一语音信号中每段子信号的第一权重,得到该多个第一权重中的最大值,根据人耳听觉能够清晰分辨的语音信号权重、该多个第一权重的最大值以及通道环境,确定指定阈值。
需要说明的是,该指定阈值具体可以为该多个第一权重的最大值的0.1倍等,当然该指定阈值也可以是其他表示方式,本发明实施例对此不做限定。
506、对于每一个通道的第一语音信号,服务器将第一权重小于该指定阈值的子信号的第二权重设置为0,根据该第一语音信号中每段子信号的响度和预定响度和,获取该第一语音信号中第一权重不小于该指定阈值的子信号的第二权重。
预定响度和是指该多个通道的第一语音信号中同一段子信号中除去已经将第二权重设置为0的子信号以外的子信号的响度和。
具体地,服务器将第一权重小于指定阈值的子信号的第二权重设置为0,并计算多个通道的第一语音信号中的同一段子信号中除去已经将第二权重设置为0的子信号以外的子信号的响度和作为预定响度和。
服务器利用第一语音信号中每段子信号的响度与预定响度和相除,得到该第一语音信号中第一权重不小于指定阈值的子信号的第二权重。
需要说明的是,上述步骤206的过程,作为一种可替代的实现方式:服务器将该第一权重小于指定阈值的子信号的响度设置为0,根据该第一语音信号中每段子信号的响度和多个通道的同一段子信号的响度和,获取该第一语音信号中每段子信号的第二权重。其中,对于第一权重小于指定阈值的子信号,由于该子信号的响度为0,因此在最终计算结果中,第一权重小于指定阈值的子信号的第二权重也为0。
比如,基于步骤204的示例,两个通道的第一权重的最大值为2/3,如果指定阈值为0.35,则通道1中的子信号1的第一权重1/3小于指定阈值,服务器将通道1中的子信号1的第二权重设置为0。
又比如,通道2中的子信号1的第一权重为2/3大于指定阈值,则服务器先将通道1中的子信号1的响度去除,然后计算第一段子信号的响度和等于通道2中的子信号1的响度2,再计算得到通道2中的子信号1的第二权重为2/2=1。
其中,为了简化响度的计算过程,服务器在得到第一语音信号中每段子信号的响度后,可以将第一语音信号中每段子信号设置信号标识,并将每段子信号的信号标识与该段子信号的响度对应存储,当服务器执行步骤206的过程时,服务器获取该第一语音信号中每段子信号的信号标识,并根据该段子信号的信号标识从已存储的响度中得到该段子信号的响度。其中,信号标识可以根据通 道标号和子信号的标号进行表示,基于步骤204的示例,通道1中的子信号2的信号标识可以表示为12,通道2中的子信号3的信号标识可以表示为23等,当然,该信号标识还可以通过其他方式表示,本发明实施例对此不做限定。
507、对于每一个通道的第一语音信号,服务器根据该第一语音信号中每段子信号的第二权重,获取该第一语音信号中每段子信号的第三权重。
为了均衡子信号的声音效果,对于一段子信号来说,当获取到该段子信号的第二权重时,可以通过服务器中的二阶低通滤波模块对该段子信号的第二权重进行处理。
该步骤507可以具体包括:对于每一个通道的第一语音信号中第m段子信号,服务器根据该第一语音信号中第m段子信号的第二权重和第m-1段子信号的第三权重,对该第一语音信号中第m段子信号的权重进行平滑处理,得到该第一语音信号中第m段子信号的第三权重,该第m段子信号的第三权重作为该通道中第m+1段子信号的第三权重初值,并根据该第一语音信号中第m+1段子信号的第二权重,对该第m+1段子信号的权重进行平滑处理,得到该第m+1段子信号的第三权重。依据上述过程进行迭代,得到该第一语音信号中每段子信号的第三权重。
其中,平滑处理可以是将较大的权重和较小的权重中和,得到一个中间值,该中间值可以通过插值等算法获取。
需要说明的是,对于每一个通道的第1段子信号,服务器根据该第1段子信号的第二权重,获取该第1段子信号的第三权重的过程可以为:服务器根据该第1段子信号的第二权重和预设初值,对该第1段子信号的权重进行平滑处理,得到该第1段子信号的第三权重。相应地,该第1段子信号的第三权重作为第2段子信号的第三权重初值,并根据第2段子信号的第二权重,以此获取该第2段子信号的第三权重。该预设初值可以由技术人员在开发时设置,也可以由用户在使用的过程中调整,本发明实施例对此不做限定。
基于步骤504的示例,通道2中的子信号1的第一权重为2/3,当经过步骤206后,通道2中的子信号1的第二权重为1,服务器中的二阶低通滤波模块的配置参数可以为0.7和0.3,预设初值为0.6,则根据通道2中的子信号1预设初值和该第二权重对通道2中的子信号1的权重进行平滑处理,具体可以为该预设初值乘以0.7,该第二权重乘以0.3,并将两个结果相加,将得到的结果作为通道2中的子信号1的第三权重,该第三权重为0.72。该通道2中的子 信号1的第三权重0.72作为该通道2中的子信号2的第三权重初值,服务器根据该通道2中的子信号2的第二权重5/8,计算得到该通道2中的子信号2的第三权重为0.6915,并将通道2中的子信号2的第三权重0.6915作为该通道2中的子信号3的第三权重初值,通过上述过程,得到该通道2中的子信号3的第三权重。
当然,服务器对该第一语音信号中每段子信号的权重进行平滑处理的方式,还可以是除上述方式以外的其它方式,本发明实施例对服务器使用何种方式对子信号的权重进行平滑处理不做限定。
508、对于每一个通道的原始语音信号,服务器根据该第一语音信号中每段子信号的第三权重调整该原始语音信号中对应的子信号。
由于原始语音信号中每段子信号为数字语音信号,对于每一段子信号,将该子信号的第三权重与该原始语音信号中该子信号的幅值相乘,得到调整后的子信号。
其中,幅值可以用于表示原始语音信号的频率或信号强度,根据模数转换时所采用的采样参数不同而变化,当然,该幅值还可以由其它参数表示,本发明实施例对此不做限定。
如果接收原始语音信号的通道数为2,分别为通道1和通道2,每一个通道的第一语音信号包括3段子信号,分别为子信号1、子信号2和子信号3,通道1中的子信号2包含100个数据,其中第51个数据为10,若通道1中的子信号2的第三权重为0.2,则将通道1中子信号2的第51个数据10乘以0.2,得到调整后的子信号2的第51个数据为2。
509、服务器将该多个通道中调整后的每段子信号对应叠加,得到处理后的语音信号。
具体地,服务器将多个通道中同一时间段接收到的经过第三权重调整的原始语音信号进行叠加。
也即,同一时间段中多个通道的每段子信号根据接收时间对应叠加,得到处理后的语音信号。
当该处理后的语音信号的幅值超过了数字域所能表征的幅值时,服务器需要对该处理后的语音信号进行进一步处理,防止该处理后的语音中出现破音的现象,则服务器还可以执行如下步骤510:
510、当该处理后的语音信号的幅值大于预设阈值时,服务器对该处理后 的语音信号进行非线性映射,得到输出语音信号。
具体地,服务器根据处理后的语音信号的幅值,确定该处理后的语音信号的幅值是否大于预设阈值,当该处理后的语音信号的幅值大于预设阈值时,服务器将该处理后的语音信号的幅值大于预设阈值的语音信号映射到指定范围内,使得输出语音信号的最大幅值不会超出数字域所能表征的范围。
例如,数字域的16比特能表示的范围是-32768~32767,如果预设阈值为27000,处理后的语音信号的幅值范围为-40000~40000,服务器需要将幅值范围在-40000~-27000及27000~40000的语音信号进行非线性映射,服务器将语音信号根据预设规则映射到指定区域-32768~32767内。
比如,将-40000~-27000的语音信号非线性映射至-32768~-27000;将27000~40000的语音信号非线性映射至27000~32767内。
其中,预设规则可以是某一个函数,也可以是其它方法,本发明实施例对此不做限定。
其中,该预设阈值没有处于数字域能表征的范围内,该预设阈值可以由技术人员在开发时设置,也可以由用户在使用的过程中调整,本发明实施例对此不做限定。
本发明实施例是以执行主体为服务器为例进行说明,当然,该过程还可以在终端设备上执行。
本发明实施例提供的方法,通过对多个通道的数字语音信号进行滤波,去掉不包含人正常发声的语音信号,得到每一个通道的第一语音信号,并根据该第一语音信号中每段子信号的响度,对该多个通道的第一语音信号进行处理,得到处理后的语音信号,有效的去除了语音信号中响度较低的无用信号,使得处理后的语音噪声减少,语音信号的辨识度提高,便于用户从处理后的语音信号中辨识有用信号。
进一步地,通过计算每段子信号的第二权重,并根据每段子信号的第二权重,得到每段子信号的第三权重,从而根据该第三权重对原始语音信号进行叠加,大大降低了处理后语音信号中所包含的噪音信号,语音信号的辨识度大大提高。
进一步地,对处理后的语音信号进行非线性映射,防止了输出语音信号出现破音现象。
图6是本发明实施例提供的一种语音信号处理装置的结构示意图。参见图6,该装置包括:原始语音信号获取模块601、滤波模块602、响度获取模块603、权重获取模块604和语音信号处理模块605。
其中,原始语音信号获取模块601,用于获取多个通道的原始语音信号,该原始语音信号为数字语音信号;模数转换模块601与滤波模块602相连接,该滤波模块602,用于对每一个通道的原始语音信号进行滤波,得到每一个通道的第一语音信号,该第一语音信号的频率属于预设频率范围;滤波模块602与响度获取模块603相连接,该响度获取模块603,用于对于每一个通道的第一语音信号,获取该第一语音信号中每段子信号的响度;响度获取模块603与权重获取模块604相连接,该权重获取模块604,用于根据该第一语音信号中每段子信号的响度以及该多个通道的同一段子信号的响度和,获取该第一语音信号中每段子信号的第一权重;权重获取模块604与语音信号处理模块605相连接,该语音信号处理模块605,用于按照该多个通道的第一语音信号中每段子信号的第一权重和该多个通道的第一语音信号,得到处理后的语音信号。
可选地,该语音信号处理模块605包括:
指定阈值确定单元,用于根据该多个通道的第一权重的最大值,确定指定阈值;
权重获取单元,用于对于每一个通道的第一语音信号,将第一权重小于该指定阈值的子信号的第二权重设置为0,根据该第一语音信号中每段子信号的响度和预定响度和,获取该第一语音信号中第一权重不小于该指定阈值的子信号的第二权重;该预定响度和是指该多个通道的第一语音信号中同一段子信号中除去已经将第二权重设置为0的子信号以外的子信号的响度和。
该权重获取单元,还用于对于每一个通道的第一语音信号,根据该第一语音信号中每段子信号的第二权重,获取该第一语音信号中每段子信号的第三权重;
该语音信号处理模块还包括:调整单元,用于对于每一个通道的原始语音信号,根据该第一语音信号中每段子信号的第三权重调整该原始语音信号中对应的子信号;
语音信号处理单元,用于将该多个通道中调整后的每段子信号对应叠加,得到处理后的语音信号。
可选地,该调整单元还用于对于每一段子信号,将该子信号的第三权重与 该原始语音信号中该子信号的幅值相乘,得到调整后的子信号。
可选地,该权重获取单元还用于对于每一个通道的第一语音信号,根据该第一语音信号中每段子信号的第二权重,对该第一语音信号中每段子信号的权重进行平滑处理,得到该第一语音信号中每段子信号的第三权重。
可选地,该装置还包括:
语音信号输出模块,用于当该处理后的语音信号的幅值大于预设阈值时,对该处理后的语音信号进行非线性映射,得到输出语音信号。
综上所述,本发明实施例提供的装置,通过对多个通道的数字语音信号进行滤波,去掉不包含人正常发声的语音信号,得到每一个通道的第一语音信号,并根据该第一语音信号中每段子信号的响度,对该多个通道的第一语音信号进行处理,得到处理后的语音信号,有效的去除了语音信号中响度较低的无用信号,使得处理后的语音噪声减少,语音信号的辨识度提高,便于用户从处理后的语音信号中辨识有用信号。
进一步地,通过计算每段子信号的第二权重,并根据每段子信号的第二权重,得到每段子信号的第三权重,从而根据该第三权重对原始语音信号进行叠加,大大降低了处理后语音信号中所包含的噪音信号,语音信号的辨识度大大提高。
进一步地,对处理后的语音信号进行非线性映射,防止了输出语音信号出现破音现象。
需要说明的是:上述实施例提供的语音信号处理装置在对语音信号处理时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将服务器的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的语音信号处理装置与语音信号处理方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
作为另外一种可能的实施方式,语音信号处理模块605,包括:
第一调整单元,用于对于所述多个通道的第一语音信号中的每一段子信号,将所述子信号的第一权重与所述原始语音信号中所述子信号的幅值相乘,得到调整后的子信号。
第一处理单元,用于将所述多个通道中调整后的每段子信号对应叠加,得到处理后的语音信号。
作为另外一种可能的实施方式,语音信号处理模块605,包括:
指定阈值确定单元,用于根据所述多个第一权重的最大值,确定指定阈值;
第二权重单元,用于对于每一个通道的第一语音信号,将第一权重小于所述指定阈值的子信号的第二权重设置为0,根据所述第一语音信号中每段子信号的响度以及预定响度和,获取所述第一语音信号中第一权重不小于所述指定阈值的子信号的第二权重;其中,所述预定响度和是所述多个通道的第一语音信号中同一段子信号中除去已经将第二权重设置为0的子信号以外的子信号的响度和;
第二调整单元,用于对于所述多个通道的第一语音信号中的每一段子信号,将所述子信号的第二权重与所述原始语音信号中所述子信号的幅值相乘,得到调整后的子信号。
第二处理单元,用于将所述多个通道中调整后的每段子信号对应叠加,得到处理后的语音信号。
作为另外一种可能的实施方式,语音信号处理模块605,包括:
第四权重单元,用于对于每一个通道的第一语音信号,根据所述第一语音信号中每段子信号的第一权重,对所述第一语音信号中每段子信号的权重进行平滑处理,得到所述第一语音信号中每段子信号的第四权重;
第四调整单元,用于对于每一个通道的原始语音信号,根据所述第一语音信号中每段子信号的第四权重与所述原始语音信号中所述子信号的幅值相乘,得到调整后的子信号;
第四处理单元,用于将所述多个通道中调整后的每段子信号对应叠加,得到处理后的语音信号。
图7是本发明实施例提供的一种服务器的结构示意图。参见图7,该服务器包括:处理器701和存储器702,该处理器701与该存储器702相连接。
该处理器701,用于获取多个通道的原始语音信号,该原始语音信号为数字语音信号;
该处理器701,还用于对每一个通道的原始语音信号进行滤波,得到每一个通道的第一语音信号,该第一语音信号的频率属于预设频率范围;
该处理器701,还用于对于每一个通道的第一语音信号,获取该第一语音信号中每段子信号的响度;
该处理器701,还用于根据该第一语音信号中每段子信号的响度以及该多个通道的同一段子信号的响度和,获取该第一语音信号中每段子信号的第一权重;
该处理器701,还用于按照该多个通道的第一语音信号中每段子信号的第一权重和该多个通道的第一语音信号,得到处理后的语音信号。
在基于图7所示实施例的第一种可能的实现方式中,该处理器701还用于按照该多个第一权重的最大值,确定指定阈值;
该处理器701,还用于对于每一个通道的第一语音信号,将第一权重小于该指定阈值的子信号的第二权重设置为0,根据该第一语音信号中每段子信号的响度和预定响度和,获取该第一语音信号中第一权重不小于该指定阈值的子信号的第二权重;其中,预定响度和是指该多个通道的第一语音信号中同一段子信号中除去已经将第二权重设置为0的子信号以外的子信号的响度和。
该处理器701,还用于对于每一个通道的第一语音信号,根据该第一语音信号中每段子信号的第二权重,获取该第一语音信号中每段子信号的第三权重。
该处理器701,还用于对于每一个通道的原始语音信号,根据该第一语音信号中每段子信号的第三权重调整该原始语音信号中对应的子信号。
该处理器701,还用于将该多个通道中调整后的每段子信号对应叠加,得到处理后的语音信号。
可选地,该处理器701还用于对于每一段子信号,将该子信号的第三权重与该原始语音信号中该子信号的幅值相乘,得到调整后的子信号。
可选地,该处理器701还用于对于每一个通道的第一语音信号,根据该第一语音信号中每段子信号的第二权重,对该第一语音信号中每段子信号的权重进行平滑处理,得到该第一语音信号中每段子信号的第三权重。
在基于图7所示实施例的第二种可能的实现方式中,所述处理器,还用于对于所述多个通道的第一语音信号中的每一段子信号,将所述子信号的第一权重与所述原始语音信号中所述子信号的幅值相乘,得到调整后的子信号。
所述处理器,还用于将所述多个通道中调整后的每段子信号对应叠加,得到处理后的语音信号。
在基于图7所示实施例的第三种可能的实现方式中,所述处理器,还用于根据所述多个第一权重的最大值,确定指定阈值;
所述处理器,还用于对于每一个通道的第一语音信号,将第一权重小于所述指定阈值的子信号的第二权重设置为0,根据所述第一语音信号中每段子信号的响度以及预定响度和,获取所述第一语音信号中第一权重不小于所述指定阈值的子信号的第二权重;其中,所述预定响度和是所述多个通道的第一语音信号中同一段子信号中除去已经将第二权重设置为0的子信号以外的子信号的响度和;
所述处理器,还用于对于所述多个通道的第一语音信号中的每一段子信号,将所述子信号的第二权重与所述原始语音信号中所述子信号的幅值相乘,得到调整后的子信号。
所述处理器,还用于将所述多个通道中调整后的每段子信号对应叠加,得到处理后的语音信号。
在基于图7所示实施例的第四种可能的实现方式中,所述处理器,还用于对于每一个通道的第一语音信号,根据所述第一语音信号中每段子信号的第一权重,对所述第一语音信号中每段子信号的权重进行平滑处理,得到所述第一语音信号中每段子信号的第四权重;
所述处理器,还用于对于每一个通道的原始语音信号,根据所述第一语音信号中每段子信号的第四权重与所述原始语音信号中所述子信号的幅值相乘,得到调整后的子信号;
所述处理器,还用于将所述多个通道中调整后的每段子信号对应叠加,得到处理后的语音信号。
结合基于图7所示实施例的第一种可能的实现方式,或第二种可能的实现方式,或第三种可能的实现方式,或第四种可能的实现方式,在第五种可能的实施方式中,该处理器701还用于当该处理后的语音信号的幅值大于预设阈值时,对该处理后的语音信号进行非线性映射,得到输出语音信号。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通 过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本发明的较佳实施例,并不用以限制本发明,凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。

Claims (24)

  1. 一种语音信号处理方法,其特征在于,所述方法包括:
    获取多个通道的原始语音信号,所述原始语音信号为数字语音信号;
    对每一个通道的原始语音信号进行滤波,得到每一个通道的第一语音信号,所述第一语音信号的频率属于预设频率范围;
    对于每一个通道的第一语音信号,获取所述第一语音信号中每段子信号的响度;
    根据所述第一语音信号中每段子信号的响度以及所述多个通道的同一段子信号的响度和,获取所述第一语音信号中每段子信号的第一权重;
    按照所述多个通道的第一语音信号中每段子信号的第一权重和所述多个通道的第一语音信号,得到处理后的语音信号。
  2. 根据权利要求1所述的方法,其特征在于,所述按照所述多个通道的第一语音信号中每段子信号的第一权重和所述多个通道的第一语音信号,得到处理后的语音信号,包括:
    根据所述多个第一权重的最大值,确定指定阈值;
    对于每一个通道的第一语音信号,将第一权重小于所述指定阈值的子信号的第二权重设置为0,根据所述第一语音信号中每段子信号的响度和预定响度和,获取所述第一语音信号中第一权重不小于所述指定阈值的子信号的第二权重;其中,所述预定响度和是所述多个通道的第一语音信号中同一段子信号中除去已经将第二权重设置为0的子信号以外的子信号的响度和;
    对于每一个通道的第一语音信号,根据所述第一语音信号中每段子信号的第二权重,获取所述第一语音信号中每段子信号的第三权重;
    对于每一个通道的原始语音信号,根据所述第一语音信号中每段子信号的第三权重调整所述原始语音信号中对应的子信号;
    将所述多个通道中调整后的每段子信号对应叠加,得到处理后的语音信号。
  3. 根据权利要求2所述的方法,其特征在于,所述对于每一个通道的原始语音信号,根据所述第一语音信号中每段子信号的第三权重调整所述原始语音信号中对应的子信号,包括:
    对于每一段子信号,将所述子信号的第三权重与所述原始语音信号中所述子信号的幅值相乘,得到调整后的子信号。
  4. 根据权利要求2所述的方法,其特征在于,所述根据所述第一语音信号中每段子信号的第二权重,获取所述第一语音信号中每段子信号的第三权重,包括:
    对于每一个通道的第一语音信号,根据所述第一语音信号中每段子信号的第二权重,对所述第一语音信号中每段子信号的权重进行平滑处理,得到所述第一语音信号中每段子信号的第三权重。
  5. 根据权利要求1所述的方法,其特征在于,所述按照所述多个通道的第一语音信号中每段子信号的第一权重和所述多个通道的第一语音信号,得到处理后的语音信号,包括:
    对于所述多个通道的第一语音信号中的每一段子信号,将所述子信号的第一权重与所述原始语音信号中所述子信号的幅值相乘,得到调整后的子信号;
    将所述多个通道中调整后的每段子信号对应叠加,得到处理后的语音信号。
  6. 根据权利要求1所述的方法,其特征在于,所述按照所述多个通道的第一语音信号中每段子信号的第一权重和所述多个通道的第一语音信号,得到处理后的语音信号,包括:
    根据所述多个第一权重的最大值,确定指定阈值;
    对于每一个通道的第一语音信号,将第一权重小于所述指定阈值的子信号的第二权重设置为0,根据所述第一语音信号中每段子信号的响度以及预定响度和,获取所述第一语音信号中第一权重不小于所述指定阈值的子信号的第二权重;其中,所述预定响度和是所述多个通道的第一语音信号中同一段子信号中除去已经将第二权重设置为0的子信号以外的子信号的响度和;
    对于所述多个通道的第一语音信号中的每一段子信号,将所述子信号的第二权重与所述原始语音信号中所述子信号的幅值相乘,得到调整后的子信号;
    将所述多个通道中调整后的每段子信号对应叠加,得到处理后的语音信号。
  7. 根据权利要求1所述的方法,其特征在于,所述按照所述多个通道的第 一语音信号中每段子信号的第一权重和所述多个通道的第一语音信号,得到处理后的语音信号,包括:
    对于每一个通道的第一语音信号,根据所述第一语音信号中每段子信号的第一权重,对所述第一语音信号中每段子信号的权重进行平滑处理,得到所述第一语音信号中每段子信号的第四权重;
    对于每一个通道的原始语音信号,根据所述第一语音信号中每段子信号的第四权重与所述原始语音信号中所述子信号的幅值相乘,得到调整后的子信号;
    将所述多个通道中调整后的每段子信号对应叠加,得到处理后的语音信号。
  8. 根据权利要求2至7任一所述的方法,其特征在于,所述将所述多个通道中调整后的每段子信号叠加,得到处理后的语音信号之后,所述方法还包括:
    当所述处理后的语音信号的幅值大于预设阈值时,对所述处理后的语音信号进行非线性映射,得到输出语音信号。
  9. 一种语音信号处理装置,其特征在于,所述装置包括:
    原始语音信号获取模块,用于获取多个通道的原始语音信号,所述原始语音信号为数字语音信号;
    滤波模块,用于对每一个通道的原始语音信号进行滤波,得到每一个通道的第一语音信号,所述第一语音信号的频率属于预设频率范围;
    响度获取模块,用于对于每一个通道的第一语音信号,获取所述第一语音信号中每段子信号的响度;
    权重获取模块,用于根据所述第一语音信号中每段子信号的响度以及所述多个通道的同一段子信号的响度和,获取所述第一语音信号中每段子信号的第一权重;
    语音信号处理模块,用于按照所述多个通道的第一语音信号中每段子信号的第一权重和所述多个通道的第一语音信号,得到处理后的语音信号。
  10. 根据权利要求9所述的装置,其特征在于,所述语音信号处理模块包括:
    指定阈值确定单元,用于根据所述多个通道的第一权重的最大值,确定指定阈值;
    权重获取单元,用于对于每一个通道的第一语音信号,将第一权重小于所 述指定阈值的子信号的第二权重设置为0,根据所述第一语音信号中每段子信号的响度和所述多个通道的第一语音信号中同一段子信号中除已将第二权重设置为0的子信号以外子信号的响度和,获取所述第一语音信号中第一权重不小于所述指定阈值的子信号的第二权重;
    所述权重获取单元还用于对于每一个通道的第一语音信号,根据所述第一语音信号中每段子信号的第二权重,获取所述第一语音信号中每段子信号的第三权重;
    所述语音信号处理模块还包括:调整单元,用于对于每一个通道的原始语音信号,根据所述第一语音信号中每段子信号的第三权重调整所述原始语音信号中对应的子信号;
    语音信号处理单元,用于将所述多个通道中调整后的每段子信号对应叠加,得到处理后的语音信号。
  11. 根据权利要求10所述的装置,其特征在于,所述调整单元还用于对于每一段子信号,将所述子信号的第三权重与所述原始语音信号中所述子信号的幅值相乘,得到调整后的子信号。
  12. 根据权利要求10所述的装置,其特征在于,所述权重获取单元还用于对于每一个通道的第一语音信号,根据所述第一语音信号中每段子信号的第二权重,对所述第一语音信号中每段子信号的权重进行平滑处理,得到所述第一语音信号中每段子信号的第三权重。
  13. 根据权利要求9所述的装置,其特征在于,所述语音信号处理模块,包括:
    第一调整单元,用于对于所述多个通道的第一语音信号中的每一段子信号,将所述子信号的第一权重与所述原始语音信号中所述子信号的幅值相乘,得到调整后的子信号;
    第一处理单元,用于将所述多个通道中调整后的每段子信号对应叠加,得到处理后的语音信号。
  14. 根据权利要求9所述的装置,其特征在于,所述语音信号处理模块, 包括:
    指定阈值确定单元,用于根据所述多个第一权重的最大值,确定指定阈值;
    第二权重单元,用于对于每一个通道的第一语音信号,将第一权重小于所述指定阈值的子信号的第二权重设置为0,根据所述第一语音信号中每段子信号的响度以及预定响度和,获取所述第一语音信号中第一权重不小于所述指定阈值的子信号的第二权重;其中,所述预定响度和是所述多个通道的第一语音信号中同一段子信号中除去已经将第二权重设置为0的子信号以外的子信号的响度和;
    第二调整单元,用于对于所述多个通道的第一语音信号中的每一段子信号,将所述子信号的第二权重与所述原始语音信号中所述子信号的幅值相乘,得到调整后的子信号;
    第二处理单元,用于将所述多个通道中调整后的每段子信号对应叠加,得到处理后的语音信号。
  15. 根据权利要求9所述的装置,其特征在于,所述语音信号处理模块,包括:
    第四权重单元,用于对于每一个通道的第一语音信号,根据所述第一语音信号中每段子信号的第一权重,对所述第一语音信号中每段子信号的权重进行平滑处理,得到所述第一语音信号中每段子信号的第四权重;
    第四调整单元,用于对于每一个通道的原始语音信号,根据所述第一语音信号中每段子信号的第四权重与所述原始语音信号中所述子信号的幅值相乘,得到调整后的子信号;
    第四处理单元,用于将所述多个通道中调整后的每段子信号对应叠加,得到处理后的语音信号。
  16. 根据权利要求10至15任一所述的装置,其特征在于,所述装置还包括:
    语音信号输出模块,用于当所述处理后的语音信号的幅值大于预设阈值时,对所述处理后的语音信号进行非线性映射,得到输出语音信号。
  17. 一种服务器,其特征在于,所述服务器包括:处理器和存储器,所述 处理器与所述存储器相连接,
    所述处理器,用于获取多个通道的原始语音信号,所述原始语音信号为数字语音信号;
    所述处理器,还用于对每一个通道的原始语音信号进行滤波,得到每一个通道的第一语音信号,所述第一语音信号的频率属于预设频率范围;
    所述处理器,还用于对于每一个通道的第一语音信号,获取所述第一语音信号中每段子信号的响度;
    所述处理器,还用于根据所述第一语音信号中每段子信号的响度以及所述多个通道的同一段子信号的响度和,获取所述第一语音信号中每段子信号的第一权重;
    所述处理器,还用于按照所述多个通道的第一语音信号中每段子信号的第一权重和所述多个通道的第一语音信号,得到处理后的语音信号。
  18. 根据权利要求17所述的服务器,其特征在于,
    所述处理器,还用于根据所述多个第一权重的最大值,确定指定阈值;
    所述处理器,还用于对于每一个通道的第一语音信号,将第一权重小于所述指定阈值的子信号的第二权重设置为0,根据所述第一语音信号中每段子信号的响度和预定响度和,获取所述第一语音信号中第一权重不小于所述指定阈值的子信号的第二权重;其中,所述预定响度和是所述多个通道的第一语音信号中同一段子信号中除去已经将第二权重设置为0的子信号以外的子信号的响度和;
    所述处理器,还用于对于每一个通道的第一语音信号,根据所述第一语音信号中每段子信号的第二权重,获取所述第一语音信号中每段子信号的第三权重;
    所述处理器,还用于对于每一个通道的原始语音信号,根据所述第一语音信号中每段子信号的第三权重调整所述原始语音信号中对应的子信号;
    所述处理器,还用于将所述多个通道中调整后的每段子信号对应叠加,得到处理后的语音信号。
  19. 根据权利要求18所述的服务器,其特征在于,
    所述处理器,还用于对于每一段子信号,将所述子信号的第三权重与所述 原始语音信号中所述子信号的幅值相乘,得到调整后的子信号。
  20. 根据权利要求18所述的服务器,其特征在于,
    所述处理器,还用于对于每一个通道的第一语音信号,根据所述第一语音信号中每段子信号的第二权重,对所述第一语音信号中每段子信号的权重进行平滑处理,得到所述第一语音信号中每段子信号的第三权重。
  21. 根据权利要求17所述的服务器,其特征在于,
    所述处理器,还用于对于所述多个通道的第一语音信号中的每一段子信号,将所述子信号的第一权重与所述原始语音信号中所述子信号的幅值相乘,得到调整后的子信号;
    所述处理器,还用于将所述多个通道中调整后的每段子信号对应叠加,得到处理后的语音信号。
  22. 根据权利要求17所述的服务器,其特征在于,
    所述处理器,还用于根据所述多个第一权重的最大值,确定指定阈值;
    所述处理器,还用于对于每一个通道的第一语音信号,将第一权重小于所述指定阈值的子信号的第二权重设置为0,根据所述第一语音信号中每段子信号的响度以及预定响度和,获取所述第一语音信号中第一权重不小于所述指定阈值的子信号的第二权重;其中,所述预定响度和是所述多个通道的第一语音信号中同一段子信号中除去已经将第二权重设置为0的子信号以外的子信号的响度和;
    所述处理器,还用于对于所述多个通道的第一语音信号中的每一段子信号,将所述子信号的第二权重与所述原始语音信号中所述子信号的幅值相乘,得到调整后的子信号;
    所述处理器,还用于将所述多个通道中调整后的每段子信号对应叠加,得到处理后的语音信号。
  23. 根据权利要求17所述的服务器,其特征在于,
    所述处理器,还用于对于每一个通道的第一语音信号,根据所述第一语音信号中每段子信号的第一权重,对所述第一语音信号中每段子信号的权重进行 平滑处理,得到所述第一语音信号中每段子信号的第四权重;
    所述处理器,还用于对于每一个通道的原始语音信号,根据所述第一语音信号中每段子信号的第四权重与所述原始语音信号中所述子信号的幅值相乘,得到调整后的子信号;
    所述处理器,还用于将所述多个通道中调整后的每段子信号对应叠加,得到处理后的语音信号。
  24. 根据权利要求17至23任一所述的服务器,其特征在于,
    所述处理器,还用于当所述处理后的语音信号的幅值大于预设阈值时,对所述处理后的语音信号进行非线性映射,得到输出语音信号。
PCT/CN2014/093656 2013-12-13 2014-12-12 语音信号处理方法、装置及服务器 WO2015085946A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310681217.9 2013-12-13
CN201310681217.9A CN103680513B (zh) 2013-12-13 2013-12-13 语音信号处理方法、装置及服务器

Publications (1)

Publication Number Publication Date
WO2015085946A1 true WO2015085946A1 (zh) 2015-06-18

Family

ID=50317866

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/093656 WO2015085946A1 (zh) 2013-12-13 2014-12-12 语音信号处理方法、装置及服务器

Country Status (2)

Country Link
CN (1) CN103680513B (zh)
WO (1) WO2015085946A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113596771A (zh) * 2021-08-23 2021-11-02 国能包神铁路集团有限责任公司 一种机车无线通信设备及其控制方法、装置

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103680513B (zh) * 2013-12-13 2016-11-02 广州华多网络科技有限公司 语音信号处理方法、装置及服务器
CN105469806B (zh) * 2014-09-12 2020-02-21 联想(北京)有限公司 一种声音处理方法、装置及系统
CN104409079A (zh) * 2014-11-03 2015-03-11 北京有恒斯康通信技术有限公司 一种音频叠加的方法和装置
CN108417208B (zh) * 2018-03-26 2020-09-11 宇龙计算机通信科技(深圳)有限公司 一种语音输入方法和装置
CN111045633A (zh) * 2018-10-12 2020-04-21 北京微播视界科技有限公司 用于检测音频信号的响度的方法和装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1111775A (zh) * 1993-12-18 1995-11-15 国际商业机器公司 音频会议系统
JPH1013556A (ja) * 1996-06-21 1998-01-16 Oki Electric Ind Co Ltd テレビ会議システム
CN1684143A (zh) * 2004-04-14 2005-10-19 华为技术有限公司 一种语音增强的方法
CN1953488A (zh) * 2006-11-01 2007-04-25 华为技术有限公司 一种多路语音信号的混音方法及装置
US20080304673A1 (en) * 2007-06-11 2008-12-11 Fujitsu Limited Multipoint communication apparatus
CN103680513A (zh) * 2013-12-13 2014-03-26 广州华多网络科技有限公司 语音信号处理方法、装置及服务器

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6404892B1 (en) * 1995-09-06 2002-06-11 Apple Computer, Inc. Reduced complexity audio mixing apparatus
US7379961B2 (en) * 1997-04-30 2008-05-27 Computer Associates Think, Inc. Spatialized audio in a three-dimensional computer-based scene
US7974713B2 (en) * 2005-10-12 2011-07-05 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Temporal and spatial shaping of multi-channel audio signals
CN100579018C (zh) * 2006-10-30 2010-01-06 北京中星微电子有限公司 一种处理音频信号的方法及其系统
CN101674450A (zh) * 2008-09-10 2010-03-17 深圳市邦彦信息技术有限公司 视频指挥调度系统中的混音方法
CN103188595B (zh) * 2011-12-31 2015-05-27 展讯通信(上海)有限公司 处理多声道音频信号的方法和系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1111775A (zh) * 1993-12-18 1995-11-15 国际商业机器公司 音频会议系统
JPH1013556A (ja) * 1996-06-21 1998-01-16 Oki Electric Ind Co Ltd テレビ会議システム
CN1684143A (zh) * 2004-04-14 2005-10-19 华为技术有限公司 一种语音增强的方法
CN1953488A (zh) * 2006-11-01 2007-04-25 华为技术有限公司 一种多路语音信号的混音方法及装置
US20080304673A1 (en) * 2007-06-11 2008-12-11 Fujitsu Limited Multipoint communication apparatus
CN103680513A (zh) * 2013-12-13 2014-03-26 广州华多网络科技有限公司 语音信号处理方法、装置及服务器

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113596771A (zh) * 2021-08-23 2021-11-02 国能包神铁路集团有限责任公司 一种机车无线通信设备及其控制方法、装置
CN113596771B (zh) * 2021-08-23 2023-11-17 国能包神铁路集团有限责任公司 一种机车无线通信设备及其控制方法、装置

Also Published As

Publication number Publication date
CN103680513A (zh) 2014-03-26
CN103680513B (zh) 2016-11-02

Similar Documents

Publication Publication Date Title
US10891931B2 (en) Single-channel, binaural and multi-channel dereverberation
WO2015085946A1 (zh) 语音信号处理方法、装置及服务器
CN103871421B (zh) 一种基于子带噪声分析的自适应降噪方法与系统
KR101935183B1 (ko) 멀티-채널 오디오 신호 내의 음성 성분을 향상시키는 신호 처리 장치
TWI463817B (zh) 可適性智慧雜訊抑制系統及方法
TWI397058B (zh) 音頻訊號之處理裝置及其方法,及電腦可讀取之紀錄媒體
WO2016107207A1 (zh) 一种耳机音效补偿方法、装置及耳机
WO2022160593A1 (zh) 一种语音增强方法、装置、系统及计算机可读存储介质
WO2013107307A1 (zh) 降噪方法及设备
CN104796836B (zh) 双耳声源增强
CN103813251B (zh) 一种可调节去噪程度的助听器去噪装置和方法
CN106409309A (zh) 一种音质增强的方法和麦克风
CN105723459A (zh) 用于改进声频信号的感知的设备和方法
KR101694225B1 (ko) 스테레오 신호를 결정하는 방법
TWI573133B (zh) 音訊處理系統及方法
CN103824563A (zh) 一种基于模块复用的助听器去噪装置和方法
JP6789827B2 (ja) 音声信号を明瞭化するためのマルチ聴覚mmse分析技法
EP2828853B1 (en) Method and system for bias corrected speech level determination
TWI465121B (zh) 利用全方向麥克風改善通話的系統及方法
WO2017045512A1 (zh) 一种语音识别的方法、装置、终端及语音识别设备
WO2018133951A1 (en) An apparatus and method for enhancing a wanted component in a signal
CN116349252A (zh) 用于处理双耳录音的方法和设备
US10916257B2 (en) Method and device for equalizing audio signals
EP2816817B1 (en) Sound field spatial stabilizer with spectral coherence compensation
WO2023172609A1 (en) Method and audio processing system for wind noise suppression

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14870539

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 26/10/16)

122 Ep: pct application non-entry in european phase

Ref document number: 14870539

Country of ref document: EP

Kind code of ref document: A1