CN111081264B - Voice signal processing method, device, equipment and storage medium - Google Patents

Voice signal processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN111081264B
CN111081264B CN201911248791.9A CN201911248791A CN111081264B CN 111081264 B CN111081264 B CN 111081264B CN 201911248791 A CN201911248791 A CN 201911248791A CN 111081264 B CN111081264 B CN 111081264B
Authority
CN
China
Prior art keywords
signal
dpcm
adpcm
modulation
voice quality
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911248791.9A
Other languages
Chinese (zh)
Other versions
CN111081264A (en
Inventor
谭志鹏
谭北平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Beijing Mininglamp Software System Co ltd
Original Assignee
Tsinghua University
Beijing Mininglamp Software System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University, Beijing Mininglamp Software System Co ltd filed Critical Tsinghua University
Priority to CN201911248791.9A priority Critical patent/CN111081264B/en
Publication of CN111081264A publication Critical patent/CN111081264A/en
Application granted granted Critical
Publication of CN111081264B publication Critical patent/CN111081264B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/20Vocoders using multiple modes using sound class specific coding, hybrid encoders or object based coding

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Telephone Function (AREA)

Abstract

The application provides a voice signal processing method, a device, equipment and a storage medium, and relates to the technical field of voice recognition. The method comprises the following steps: detecting the voice quality of an input analog audio signal; determining the weights of the differential pulse code modulation DPCM and the adaptive differential pulse code modulation ADPCM according to the voice quality; DPCM processing and ADPCM processing are respectively carried out on the analog audio signal to obtain a first modulation signal and a second modulation signal; and weighting preset type parameters of the first modulation signal and the second modulation signal according to the weight of the DPCM and the ADPCM to obtain a target modulation signal. The method and the device can effectively solve the problem of low voice recognition efficiency and improve the voice recognition rate.

Description

Voice signal processing method, device, equipment and storage medium
Technical Field
The present application relates to the field of communications technologies, and in particular, to a method, an apparatus, a device, and a storage medium for processing a voice signal.
Background
Speech recognition technology is a high technology that allows machines to convert speech signals into corresponding text or commands through a recognition and understanding process. The voice conversion method is widely applied to various aspects in life, and voice can be converted into characters only by dictation of a user, so that life becomes more convenient.
In the prior art, the speech coding techniques applied in the speech recognition field mainly include Differential Pulse Code Modulation (DPCM) and Adaptive Differential Pulse Code Modulation (ADPCM).
However, ADPCM and DPCM have advantages and disadvantages, and most of the smart devices on the market adopt only one speech coding technique during speech processing, so that problems of poor speech input effect, low recognition rate and even no recognition may occur, and the use effect of the smart devices and the user experience are greatly affected.
Disclosure of Invention
An object of the present application is to provide a method, an apparatus, a device and a storage medium for processing a voice signal, so as to solve the above-mentioned drawbacks of the prior art.
In order to achieve the above purpose, the technical solutions adopted in the embodiments of the present application are as follows:
in a first aspect, an embodiment of the present application provides a speech signal processing method, including:
detecting the voice quality of an input analog audio signal;
determining weights of Differential Pulse Code Modulation (DPCM) and Adaptive Differential Pulse Code Modulation (ADPCM) according to the voice quality;
respectively carrying out DPCM processing and ADPCM processing on the analog audio signal to obtain a first modulation signal and a second modulation signal;
and weighting preset type parameters of the first modulation signal and the second modulation signal according to the weight of the DPCM and the ADPCM to obtain a target modulation signal.
Optionally, the determining weights of Differential Pulse Code Modulation (DPCM) and Adaptive Differential Pulse Code Modulation (ADPCM) according to the voice quality comprises:
judging whether the voice quality meets a preset quality requirement or not;
and determining the weight of the DPCM and the ADPCM according to the judgment result of the voice quality.
Optionally, the determining the weight of the DPCM and the ADPCM according to the determination result of the voice quality includes:
if the voice quality does not meet the quality requirement, determining a first weighted modulation mode, wherein the first weighted modulation mode is that the weight of the DPCM is greater than the weight of the ADPCM.
Optionally, the determining the weight of the DPCM and the ADPCM according to the determination result of the voice quality includes:
and if the voice quality meets the quality requirement, determining a second weighted modulation mode, wherein the weight of the ADPCM in the second weighted modulation mode is greater than the weight of the DPCM.
Optionally, the voice quality comprises: signal correlation of adjacent sampling points in the analog audio signal; the judging whether the voice quality meets the preset quality requirement includes:
judging whether the signal correlation is greater than or equal to a preset correlation threshold value;
determining that the speech quality does not meet the quality requirement if the signal correlation is less than the correlation threshold;
determining that the speech quality satisfies the quality requirement if the signal correlation is greater than or equal to the correlation threshold.
Optionally, the method further comprises:
performing analog-to-digital conversion on the target modulation signal to generate an audio file;
generating a Haiming window image corresponding to the audio file;
and generating a target language spectrogram according to the Hamming window image, and performing audio matching in a preset audio recognition library by adopting the target language spectrogram.
Optionally, the preset type parameter is any one of the following types: peak, common peak, frequency, stop, fricative.
In a second aspect, an embodiment of the present application further provides a speech signal processing apparatus, including: detection module, confirm module, processing module and weighting module, wherein:
the detection module is used for detecting the voice quality of the input analog audio signal;
the determining module is used for determining the weights of the Differential Pulse Code Modulation (DPCM) and the Adaptive Differential Pulse Code Modulation (ADPCM) according to the voice quality;
the processing module is used for respectively carrying out DPCM processing and ADPCM processing on the analog audio signal to obtain a first modulation signal and a second modulation signal;
the weighting module is configured to weight preset type parameters of the first modulation signal and the second modulation signal according to the weights of the DPCM and the ADPCM, so as to obtain a target modulation signal.
Optionally, the apparatus further comprises: the judging module is used for judging whether the voice quality meets the preset quality requirement or not;
the determining module is further configured to determine the weights of the DPCM and the ADPCM according to the determination result of the voice quality.
Optionally, the determining module is further configured to determine a first weighted modulation scheme if the voice quality does not meet the quality requirement, where the first weighted modulation is that a weight of the DPCM is greater than a weight of the ADPCM.
Optionally, the determining module is further configured to determine a second weighted modulation scheme if the voice quality meets the quality requirement, where a weight of the ADPCM in the second weighted modulation scheme is greater than a weight of the DPCM.
Optionally, the determining module is further configured to determine whether the signal correlation is greater than or equal to a preset correlation threshold;
the determining module is further configured to determine that the speech quality does not meet the quality requirement if the signal correlation is less than the correlation threshold;
the determining module is further configured to determine that the speech quality meets the quality requirement if the signal correlation is greater than or equal to the correlation threshold.
Optionally, the apparatus further comprises a generating module and a matching module, wherein:
the generating module is used for performing analog-to-digital conversion on the target modulation signal to generate an audio file; generating a Haiming window image corresponding to the audio file;
and the matching module is used for generating a target language spectrogram according to the Hamming window image and performing audio matching in a preset audio recognition library by adopting the target language spectrogram.
In a third aspect, an embodiment of the present application further provides a speech signal processing apparatus, including: a memory storing a computer program executable by the processor, and a processor implementing any of the methods provided by the first aspect when executing the computer program.
In a fourth aspect, an embodiment of the present application further provides a storage medium, where a computer program is stored on the storage medium, and when the computer program is read and executed, the computer program implements any one of the methods provided in the first aspect.
The beneficial effect of this application is: by adopting the voice signal processing method provided by the application, the analog audio signal can be respectively processed by adopting Differential Pulse Code Modulation (DPCM) and Adaptive Differential Pulse Code Modulation (ADPCM) to obtain a corresponding first modulation signal and a second modulation signal, the weights of the DPCM and the ADPCM are determined according to the voice quality of the input analog audio signal, and the first modulation signal and the second modulation signal are weighted according to the determined weights to obtain a target modulation signal. The processing mode can determine the weight of DPCM and ADPCM according to different voice quality, and carry out weighting processing on the first modulation signal and the second modulation signal according to the weight to obtain a target modulation signal, so that the problem of low voice recognition efficiency can be effectively solved by the obtained target modulation signal, and the voice recognition rate is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
Fig. 1 is a schematic flowchart of a speech signal processing method according to an embodiment of the present application;
fig. 2 is a schematic flowchart of a speech signal processing method according to another embodiment of the present application;
fig. 3 is a schematic flowchart of a speech signal processing method according to another embodiment of the present application;
fig. 4 is a schematic structural diagram of a speech signal processing apparatus according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a speech signal processing apparatus according to another embodiment of the present application;
fig. 6 is a schematic structural diagram of a speech signal processing apparatus according to another embodiment of the present application;
fig. 7 is a schematic structural diagram of a speech signal processing apparatus according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments.
Fig. 1 is a schematic flow chart of a speech signal processing method according to an embodiment of the present application, where the speech signal processing method may be executed by any electronic device with speech recognition and speech processing functions, such as a mobile phone, a tablet computer, and a wearable device, and may also be executed by an application server corresponding to a speech application program in the electronic device, as described below, an example is performed on an execution process of the electronic device, and the details of the method executed by the server are not repeated herein. As shown in fig. 1, the method may include:
s101: the speech quality of an input analog audio signal is detected.
Optionally, the input analog audio signal may be a user audio signal acquired by the electronic device in real time, or may also be an audio signal selected by the electronic device from a preset audio signal set, and the specific audio signal uploading mode may be determined according to a user requirement, which is not limited herein.
Alternatively, the voice quality of the analog audio signal may be determined by analyzing the sampled data using a waveform diagram of the voice signal as the sampled data, and the analyzing method may be any one of the following methods: judging whether the amplitudes in the sampling data are uniformly distributed or not; judging by correlation between adjacent samples in the sampling data; judging by correlation between cycles of sampling data; by the correlation between the sampled data pitches.
S102: the weights of the differential pulse code modulation DPCM and the adaptive differential pulse code modulation ADPCM are determined according to the speech quality.
The DPCM predicts the current sample by using the previous n samples according to a certain rule for the difference coding between adjacent samples, quantizes the error between the predicted value and the actual value and then transmits the quantized error, and recovers the original signal by adopting the same prediction method as that of the transmitting end according to the error signal. The DPCM modulated signal reduces the system transmission bandwidth required correspondingly, and can improve the signal-to-noise ratio.
ADPCM is waveform coding with better performance, is a good way to obtain sound with low space consumption and high quality, and has the shortest coding and decoding time delay, low algorithm complexity and small compression ratio for other voice technologies.
The DPCM has better processing effect under the condition of poor voice quality; the ADPCM has a good processing effect under the condition of good voice quality, and the weight of the two algorithms is determined according to the voice quality, so that the voice recognition rate can be improved by the scheme of weighting processing of the two algorithms no matter whether the voice quality of the analog audio signal is high or low.
S103: DPCM processing and ADPCM processing are respectively carried out on the analog audio signal to obtain a first modulation signal and a second modulation signal.
Wherein, the first modulation signal is obtained by processing the audio signal according to DPCM, and the second modulation signal is obtained by processing the audio signal according to ADPCM.
S104: and weighting preset type parameters of the first modulation signal and the second modulation signal according to the weight of the DPCM and the ADPCM to obtain a target modulation signal.
Alternatively, the preset type parameter may include any one of the following types of parameters: peak, co-peak, frequency, stop, fricative, etc.
And weighting each parameter of the preset type according to the preset weight to obtain a target modulation signal.
By adopting the voice signal processing method provided by the application, the analog audio signal can be respectively processed by adopting Differential Pulse Code Modulation (DPCM) and Adaptive Differential Pulse Code Modulation (ADPCM) to obtain a corresponding first modulation signal and a second modulation signal, the weights of the DPCM and the ADPCM are determined according to the voice quality of the input analog audio signal, and the first modulation signal and the second modulation signal are weighted according to the determined weights to obtain a target modulation signal. The processing mode can determine the weight of DPCM and ADPCM according to different voice quality, and carry out weighting processing on the first modulation signal and the second modulation signal according to the weight to obtain a target modulation signal, so that the problem of low voice recognition efficiency can be effectively solved by the obtained target modulation signal, and the voice recognition rate is improved.
Fig. 2 is a schematic flowchart of a speech signal processing method according to an embodiment of the present application, and as shown in fig. 2, S102 includes:
s105: and judging whether the voice quality meets the preset quality requirement.
Then, the weight of DPCM and ADPCM is determined according to the judgment result of the voice quality.
In one embodiment of the present application, the speech quality includes: the signal correlation of adjacent sample points in the analog audio signal. Then, determining whether the voice quality meets a preset quality requirement may be: judging whether the signal correlation is greater than or equal to a preset correlation threshold value; if the signal correlation is smaller than the correlation threshold value, determining that the voice quality does not meet the quality requirement; and if the signal correlation is greater than or equal to the correlation threshold, determining that the voice quality meets the quality requirement.
For example, the following steps are carried out: in an embodiment of the present application, the preset relevance threshold is set to 0.04, that is, the voice quality may be determined by: the judgment is carried out through the correlation between adjacent signals in the sampling data, and the specific judgment method comprises the following steps: collecting the correlation between adjacent signals in the sampling data according to the sampling frequency of 48KHZ, judging the correlation between the adjacent signals, and if the correlation between the adjacent signals is greater than or equal to 0.04, determining that the voice quality of the audio signal is high-quality voice; if the correlation between adjacent signals is less than 0.04, the voice quality of the audio signal is considered to be low-quality voice, but the specific preset correlation threshold may be designed according to the user's needs, and is not limited to the threshold given in the above embodiment.
The method comprises the steps of collecting correlation between adjacent signals of sampling data according to a preset sampling frequency, dividing the sampling data into a plurality of frames, wherein each frame of the sampling data corresponds to a frequency spectrum (calculated through short-time FFT), and the frequency spectrum represents the relation between frequency and energy.
Optionally, if the voice quality does not meet the preset criterion, the current voice quality is considered not to meet the quality requirement, and then S106a is executed: a first weighted modulation scheme is determined.
Wherein, the weight of DPCM in the first weighted modulation is larger than that of ADPCM, that is, the DPCM algorithm is taken as the main algorithm to perform the weighted modulation of the audio. The modulation mode can improve the voice recognition rate, and avoids the problem of low recognition rate caused by only adopting an ADPCM algorithm for modulation when the collected voice quality is poor.
If the voice quality meets the preset standard, the current voice quality is considered to meet the quality requirement, and then S106b is executed: a second weighted modulation scheme is determined.
The ADPCM weight in the second weighted modulation mode is larger than the DPCM weight, namely the ADPCM algorithm is mainly used for carrying out weighted modulation on the audio, and the modulation mode can compress the obtained audio file on the premise of not damaging the sound quality after the A/D conversion is carried out on the subsequent voice, so that the size of the file is reduced, the voice recognition rate is improved, and the problem of difficult subsequent compression caused by only adopting the DPCM algorithm in the audio modulation process is solved.
Optionally, in an embodiment of the present application, in the first weighted modulation: DPCM was weighted 60% and ADPCM was weighted 40%; in the second weighted modulation mode, the weight of ADPCM is 60%, and the weight of DPCM is 40%; however, the setting of the specific weight can be adjusted according to the user's needs, and is not limited to the above embodiment.
Fig. 3 is a schematic flowchart of a speech signal processing method according to another embodiment of the present application, and as shown in fig. 3, the method further includes:
s107: and performing analog-to-digital conversion on the target modulation signal to generate an audio file.
The audio file is obtained according to the weighted target modulation signal, and compared with the audio file obtained by only processing through one algorithm in the prior art, the voice recognition rate is higher.
S108: and generating a Haiming window image corresponding to the audio file.
After the audio file is obtained, format conversion, resampling, pre-emphasis, and framing are performed on the audio file, and a hamming window image corresponding to the audio is constructed.
S109: and generating a target language spectrogram according to the Hamming window image, and performing audio matching in a preset audio recognition library by adopting the target language spectrogram.
After Fourier transform is carried out on the Hamming window image, a target spectrogram corresponding to an input analog audio signal is generated, audio matching is carried out in a preset audio recognition base according to the spectrogram, and character information corresponding to the target spectrogram is obtained.
The Hamming window image is subjected to Fourier transform, a nonlinear problem can be converted into a linear problem, and therefore the matching mode becomes more visual.
In the method provided by this embodiment, because the target modulation signal is obtained by weighting the input analog audio signal through two algorithms, and the processed target modulation signal is subjected to digital-to-analog conversion to generate a corresponding hamming window, and a target spectrogram is generated according to the hamming window, compared with the conventional technique in which only one algorithm is used to process the analog audio signal, the processing method of the present application can make the transmission bit rate of the processed target modulation signal lower; the system transmission bandwidth is reduced; under the condition of the same bit rate, the signal-to-noise ratio can be improved; the quantization levels are increased, and the quantization noise is improved.
Fig. 4 is a speech signal processing apparatus according to an embodiment of the present application, and as shown in fig. 4, the apparatus includes: a detection module 201, a determination module 202, a processing module 203, and a weighting module 204, wherein:
the detecting module 201 is configured to detect a voice quality of an input analog audio signal.
A determining module 202, configured to determine weights of the differential pulse code modulation DPCM and the adaptive differential pulse code modulation ADPCM according to the voice quality.
The processing module 203 is configured to perform DPCM processing and ADPCM processing on the analog audio signal to obtain a first modulation signal and a second modulation signal.
And the weighting module 204 is configured to weight preset type parameters of the first modulation signal and the second modulation signal according to the weights of the DPCM and the ADPCM, so as to obtain a target modulation signal.
Fig. 5 is a speech signal processing apparatus according to another embodiment of the present application, and as shown in fig. 5, the apparatus further includes: the determining module 205 is configured to determine whether the voice quality meets a preset quality requirement.
The determining module 202 is further configured to determine the weights of the DPCM and the ADPCM according to the determination result of the voice quality.
Optionally, the determining module 202 is further configured to determine a first weighted modulation mode if the voice quality does not meet the quality requirement, where the first weighted modulation mode is that the weight of the intermediate DPCM is greater than the weight of the ADPCM.
Optionally, the determining module 202 is further configured to determine a second weighted modulation scheme if the voice quality meets the quality requirement, where a weight of the ADPCM in the second weighted modulation scheme is greater than a weight of the DPCM.
Optionally, the determining module 205 is further configured to determine whether the signal correlation is greater than or equal to a preset correlation threshold.
The determining module 202 is further configured to determine that the voice quality does not meet the quality requirement if the signal correlation is smaller than the correlation threshold.
The determining module 202 is further configured to determine that the voice quality meets the quality requirement if the signal correlation is greater than or equal to the correlation threshold.
Fig. 6 is a speech signal processing apparatus according to another embodiment of the present application, and as shown in fig. 6, the apparatus further includes a generating module 206 and a matching module 207, where:
a generating module 206, configured to perform analog-to-digital conversion on the target modulation signal to generate an audio file; and generating a Haiming window image corresponding to the audio file.
And the matching module 207 is used for generating a target language spectrogram according to the hamming window image, and performing audio matching in a preset audio recognition library by adopting the target language spectrogram.
The following describes apparatuses, devices, and storage media for executing the methods provided in the present application, and specific implementation procedures and technical effects thereof are referred to above, and will not be described again below.
The above-mentioned apparatus is used for executing the method provided by the foregoing embodiment, and the implementation principle and technical effect are similar, which are not described herein again.
These above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. For another example, when one of the above modules is implemented in the form of a Processing element scheduler code, the Processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor capable of calling program code. For another example, these modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).
Fig. 7 is a schematic structural diagram of a performance testing device of a game scenario provided in an embodiment of the present application, where the performance testing device of the game scenario may be integrated in a terminal device or a chip of the terminal device.
The performance test equipment of the game scene comprises: a processor 501, a storage medium 502, and a bus 503.
The processor 501 is used for storing a program, and the processor 501 calls the program stored in the storage medium 502 to execute the method embodiment corresponding to fig. 1-3. The specific implementation and technical effects are similar, and are not described herein again.
Optionally, the present application also provides a program product, such as a storage medium, on which a computer program is stored, including a program, which, when executed by a processor, performs embodiments corresponding to the above-described method.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to perform some steps of the methods according to the embodiments of the present application. And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (8)

1. A speech signal processing method, comprising:
detecting the voice quality of an input analog audio signal;
determining weights of Differential Pulse Code Modulation (DPCM) and Adaptive Differential Pulse Code Modulation (ADPCM) according to the voice quality;
respectively carrying out DPCM processing and ADPCM processing on the analog audio signal to obtain a first modulation signal and a second modulation signal;
weighting preset type parameters of the first modulation signal and the second modulation signal according to the weight of the DPCM and the ADPCM to obtain a target modulation signal;
the determining the weights of the Differential Pulse Code Modulation (DPCM) and the Adaptive Differential Pulse Code Modulation (ADPCM) according to the voice quality comprises the following steps:
judging whether the voice quality meets a preset quality requirement or not;
determining the weight of the DPCM and the ADPCM according to the judgment result of the voice quality;
the voice quality includes: signal correlation of adjacent sampling points in the analog audio signal; the judging whether the voice quality meets the preset quality requirement includes:
judging whether the signal correlation is greater than or equal to a preset correlation threshold value;
determining that the speech quality does not meet the quality requirement if the signal correlation is less than the correlation threshold;
determining that the speech quality satisfies the quality requirement if the signal correlation is greater than or equal to the correlation threshold.
2. The method of claim 1, wherein determining the weight of the DPCM and the ADPCM according to the determination result of the voice quality comprises:
if the voice quality does not meet the quality requirement, determining a first weighted modulation mode, wherein the weight of the DPCM in the first weighted modulation mode is larger than the weight of the ADPCM.
3. The method of claim 1, wherein determining the weight of the DPCM and the ADPCM according to the determination result of the voice quality comprises:
and if the voice quality meets the quality requirement, determining a second weighted modulation mode, wherein the weight of the ADPCM in the second weighted modulation mode is greater than the weight of the DPCM.
4. The method according to any one of claims 1-3, further comprising:
performing analog-to-digital conversion on the target modulation signal to generate an audio file;
generating a Haiming window image corresponding to the audio file;
and generating a target language spectrogram according to the Hamming window image, and performing audio matching in a preset audio recognition library by adopting the target language spectrogram.
5. A method according to any of claims 1-3, characterized in that said preset type of parameter is any of the following types of parameter: peak, common peak, frequency, stop, fricative.
6. A speech signal processing apparatus, comprising: detection module, confirm module, processing module and weighting module, wherein:
the detection module is used for detecting the voice quality of the input analog audio signal;
the determining module is used for determining the weights of the Differential Pulse Code Modulation (DPCM) and the Adaptive Differential Pulse Code Modulation (ADPCM) according to the voice quality;
the processing module is used for respectively carrying out DPCM processing and ADPCM processing on the analog audio signal to obtain a first modulation signal and a second modulation signal;
the weighting module is used for weighting preset type parameters of the first modulation signal and the second modulation signal according to the weight of the DPCM and the ADPCM to obtain a target modulation signal;
the device further comprises: a judgment module;
the judging module is used for judging whether the voice quality meets the preset quality requirement;
the determining module is further configured to determine the weights of the DPCM and the ADPCM according to the determination result of the voice quality;
the voice quality includes: signal correlation of adjacent sampling points in the analog audio signal;
the judging module is further configured to judge whether the signal correlation is greater than or equal to a preset correlation threshold;
the determining module is further configured to determine that the speech quality does not meet the quality requirement if the signal correlation is less than the correlation threshold;
the determining module is further configured to determine that the speech quality meets the quality requirement if the signal correlation is greater than or equal to the correlation threshold.
7. A speech signal processing apparatus, characterized by comprising: a memory storing a computer program executable by the processor, and a processor implementing the method of any of the preceding claims 1-5 when executing the computer program.
8. A storage medium having stored thereon a computer program which, when read and executed, implements the method of any of claims 1-5.
CN201911248791.9A 2019-12-06 2019-12-06 Voice signal processing method, device, equipment and storage medium Active CN111081264B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911248791.9A CN111081264B (en) 2019-12-06 2019-12-06 Voice signal processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911248791.9A CN111081264B (en) 2019-12-06 2019-12-06 Voice signal processing method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111081264A CN111081264A (en) 2020-04-28
CN111081264B true CN111081264B (en) 2022-03-29

Family

ID=70313373

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911248791.9A Active CN111081264B (en) 2019-12-06 2019-12-06 Voice signal processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111081264B (en)

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA1249060A (en) * 1985-11-07 1989-01-17 Richard L. Zinser, Jr. Hybrid subband coder/decoder method and apparatus
KR900015473A (en) * 1989-03-02 1990-10-27 하라 레이노스께 Coding method of speech signal
WO2006030340A2 (en) * 2004-09-17 2006-03-23 Koninklijke Philips Electronics N.V. Combined audio coding minimizing perceptual distortion
CN100505714C (en) * 2005-03-25 2009-06-24 华为技术有限公司 Drop-frame processing device and method based on ADPCM
JP5648123B2 (en) * 2011-04-20 2015-01-07 パナソニック インテレクチュアル プロパティ コーポレーション オブアメリカPanasonic Intellectual Property Corporation of America Speech acoustic coding apparatus, speech acoustic decoding apparatus, and methods thereof
CN103198834B (en) * 2012-01-04 2016-12-14 中国移动通信集团公司 A kind of acoustic signal processing method, device and terminal
JP6170172B2 (en) * 2012-11-13 2017-07-26 サムスン エレクトロニクス カンパニー リミテッド Coding mode determination method and apparatus, audio coding method and apparatus, and audio decoding method and apparatus
EP2830053A1 (en) * 2013-07-22 2015-01-28 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Multi-channel audio decoder, multi-channel audio encoder, methods and computer program using a residual-signal-based adjustment of a contribution of a decorrelated signal
CN109102816B (en) * 2018-08-14 2020-12-29 Oppo广东移动通信有限公司 Encoding control method and device and electronic equipment
CN109859745A (en) * 2019-03-27 2019-06-07 北京爱数智慧科技有限公司 A kind of audio-frequency processing method, equipment and computer-readable medium
CN110473528B (en) * 2019-08-22 2022-01-28 北京明略软件系统有限公司 Speech recognition method and apparatus, storage medium, and electronic apparatus

Also Published As

Publication number Publication date
CN111081264A (en) 2020-04-28

Similar Documents

Publication Publication Date Title
US10049674B2 (en) Method and apparatus for evaluating voice quality
CN110600017A (en) Training method of voice processing model, voice recognition method, system and device
CN109147806B (en) Voice tone enhancement method, device and system based on deep learning
CN110176256B (en) Recording file format conversion method and device, computer equipment and storage medium
KR100930060B1 (en) Recording medium on which a signal detecting method, apparatus and program for executing the method are recorded
EP2005423B1 (en) Processing of excitation in audio coding and decoding
CN105118522B (en) Noise detection method and device
EP3739582B1 (en) Voice detection
CN104966517A (en) Voice frequency signal enhancement method and device
CN104036788B (en) The acoustic fidelity identification method of audio file and device
CN112151055B (en) Audio processing method and device
CN112185410B (en) Audio processing method and device
US20230097520A1 (en) Speech enhancement method and apparatus, device, and storage medium
US20230186943A1 (en) Voice activity detection method and apparatus, and storage medium
CN111107284B (en) Real-time generation system and generation method for video subtitles
KR100930061B1 (en) Signal detection method and apparatus
CN114333893A (en) Voice processing method and device, electronic equipment and readable medium
CN114333892A (en) Voice processing method and device, electronic equipment and readable medium
CN111081264B (en) Voice signal processing method, device, equipment and storage medium
CN116364107A (en) Voice signal detection method, device, equipment and storage medium
CN107919136B (en) Digital voice sampling frequency estimation method based on Gaussian mixture model
CN115273909A (en) Voice activity detection method, device, equipment and computer readable storage medium
Jose Amrconvnet: Amr-coded speech enhancement using convolutional neural networks
Safa et al. The real time implementation on dsp of speech enhancement based on kalman filter and wavelet thresholding
CN113450812A (en) Howling detection method, voice call method and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant