CN112151052A - Voice enhancement method and device, computer equipment and storage medium - Google Patents
Voice enhancement method and device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN112151052A CN112151052A CN202011153521.2A CN202011153521A CN112151052A CN 112151052 A CN112151052 A CN 112151052A CN 202011153521 A CN202011153521 A CN 202011153521A CN 112151052 A CN112151052 A CN 112151052A
- Authority
- CN
- China
- Prior art keywords
- voice
- enhancement
- data
- voice data
- speech
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 238000012545 processing Methods 0.000 claims abstract description 109
- 230000006870 function Effects 0.000 claims description 16
- 230000002708 enhancing effect Effects 0.000 claims description 13
- 238000001914 filtration Methods 0.000 claims description 11
- 230000009467 reduction Effects 0.000 claims description 10
- 238000004422 calculation algorithm Methods 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 8
- 238000000605 extraction Methods 0.000 claims description 4
- 238000012216 screening Methods 0.000 claims description 2
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 230000000694 effects Effects 0.000 description 11
- 238000010586 diagram Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 239000000284 extract Substances 0.000 description 4
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- FFBHFFJDDLITSX-UHFFFAOYSA-N benzyl N-[2-hydroxy-4-(3-oxomorpholin-4-yl)phenyl]carbamate Chemical compound OC1=C(NC(=O)OCC2=CC=CC=C2)C=CC(=C1)N1CCOCC1=O FFBHFFJDDLITSX-UHFFFAOYSA-N 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 238000009432 framing Methods 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000012502 risk assessment Methods 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Navigation (AREA)
Abstract
The invention discloses a voice enhancement method, a voice enhancement device, computer equipment and a storage medium, relates to the technical field of artificial intelligence, and mainly aims to automatically select voice enhancement parameters matched with the surrounding environment from a pre-constructed voice enhancement parameter set, and enable the voice recognition accuracy to reach the highest after voice enhancement processing is carried out on voice data to be recognized by utilizing the voice enhancement parameters. The method comprises the following steps: acquiring voice data to be processed; extracting a first voice feature corresponding to the voice data, determining a target environment where the voice data is located according to the first voice feature, and selecting a target voice enhancement parameter corresponding to the target environment from a pre-constructed voice enhancement parameter set; and performing voice enhancement processing on the voice data according to the target voice enhancement parameter to obtain voice data after the voice enhancement processing. The invention is mainly suitable for the voice enhancement processing of voice data.
Description
Technical Field
The present invention relates to the field of artificial intelligence technology, and in particular, to a speech enhancement method, apparatus, computer device, and storage medium.
Background
In recent years, with the rapid development and rise of intelligent wearable devices, consumer electronic products controlled by voice have become the latest trend, voice intelligence requires an automatic voice recognition intelligent system with strong reliability and high accuracy as a support, and a front-end voice enhancement technology is the most critical loop.
At present, when noise is processed by using a front-end speech enhancement technology, parameters of a speech enhancement module are generally adjusted according to the surrounding environment and the expert experience, so as to achieve a better speech recognition effect. However, the method of adjusting the speech enhancement parameters according to expert experience can only adapt to the surrounding environment to a certain extent, and improve the effect of high speech recognition, but cannot ensure that the accuracy of speech recognition reaches the highest.
Disclosure of Invention
The invention provides a voice enhancement method, a voice enhancement device, computer equipment and a storage medium, which mainly aim at automatically selecting voice enhancement parameters matched with surrounding environments from a pre-constructed voice enhancement parameter set, and after voice enhancement processing is carried out on voice data to be recognized by utilizing the voice enhancement parameters, the voice recognition accuracy can reach the highest, so that the optimal voice recognition effect can be achieved in any environment.
According to a first aspect of the present invention, there is provided a speech enhancement method comprising:
acquiring voice data to be processed;
extracting a first voice feature corresponding to the voice data, determining a target environment where the voice data is located according to the first voice feature, and selecting a target voice enhancement parameter corresponding to the target environment from a pre-constructed voice enhancement parameter set, wherein the voice enhancement parameter set comprises voice enhancement parameters under different environments, and the voice enhancement parameters are used for enhancing the voice recognition accuracy under different environments;
and performing voice enhancement processing on the voice data according to the target voice enhancement parameter to obtain voice data after the voice enhancement processing.
According to a second aspect of the present invention, there is provided a speech enhancement apparatus comprising:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring voice data to be processed;
the selecting unit is used for extracting a first voice feature corresponding to the voice data, determining a target environment where the voice data is located according to the first voice feature, and selecting a target voice enhancement parameter corresponding to the target environment from a pre-constructed voice enhancement parameter set, wherein the voice enhancement parameter set comprises voice enhancement parameters under different environments, and the voice enhancement parameters are used for enhancing the voice recognition accuracy under different environments;
and the processing unit is used for carrying out voice enhancement processing on the voice data according to the target voice enhancement parameter to obtain the voice data after the voice enhancement processing.
According to a third aspect of the present invention, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:
acquiring voice data to be processed;
extracting a first voice feature corresponding to the voice data, determining a target environment where the voice data is located according to the first voice feature, and selecting a target voice enhancement parameter corresponding to the target environment from a pre-constructed voice enhancement parameter set, wherein the voice enhancement parameter set comprises voice enhancement parameters under different environments, and the voice enhancement parameters are used for enhancing the voice recognition accuracy under different environments;
and performing voice enhancement processing on the voice data according to the target voice enhancement parameter to obtain voice data after the voice enhancement processing.
According to a fourth aspect of the present invention, there is provided a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the program:
acquiring voice data to be processed;
extracting a first voice feature corresponding to the voice data, determining a target environment where the voice data is located according to the first voice feature, and selecting a target voice enhancement parameter corresponding to the target environment from a pre-constructed voice enhancement parameter set, wherein the voice enhancement parameter set comprises voice enhancement parameters under different environments, and the voice enhancement parameters are used for enhancing the voice recognition accuracy under different environments;
and performing voice enhancement processing on the voice data according to the target voice enhancement parameter to obtain voice data after the voice enhancement processing.
Compared with the current mode of adjusting the parameters of the voice enhancement module according to the expert experience, the voice enhancement method, the voice enhancement device, the computer equipment and the storage medium can acquire the voice data to be processed; simultaneously extracting a first voice feature corresponding to the voice data, determining a target environment where the voice data is located according to the first voice feature, and selecting a target voice enhancement parameter corresponding to the target environment from a pre-constructed voice enhancement parameter set, wherein the voice enhancement parameter set comprises voice enhancement parameters under different environments, and the voice enhancement parameters are used for enhancing the voice recognition accuracy under different environments; and then carrying out voice enhancement processing on the voice data according to the target voice enhancement parameters to obtain the voice data after the voice enhancement processing, so that the target environment where the voice data to be processed is located is determined, the target voice enhancement parameters corresponding to the voice enhancement parameters can be automatically selected from the voice enhancement parameter set, and the voice data is subjected to voice enhancement processing by using the target voice enhancement parameters, so that the voice enhancement effect in the target environment can be improved, and meanwhile, the highest accuracy of voice recognition in the target environment can be ensured.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a flow chart of a speech enhancement method according to an embodiment of the present invention;
FIG. 2 is a flow chart of another speech enhancement method provided by an embodiment of the present invention;
fig. 3 is a schematic structural diagram illustrating a speech enhancement apparatus according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of another speech enhancement apparatus provided in an embodiment of the present invention;
fig. 5 shows a physical structure diagram of a computer device according to an embodiment of the present invention.
Detailed Description
The invention will be described in detail hereinafter with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
At present, when noise is processed by using a front-end speech enhancement technology, parameters of a speech enhancement module are generally adjusted according to the surrounding environment and the expert experience, so as to achieve a better speech recognition effect. However, the method of adjusting the speech enhancement parameters according to expert experience can only adapt to the surrounding environment to a certain extent, and improve the effect of high speech recognition, but cannot ensure that the accuracy of speech recognition reaches the highest.
In order to solve the above problem, an embodiment of the present invention provides a credit risk assessment method, as shown in fig. 1, the method including:
101. and acquiring voice data to be processed.
For the embodiment of the present invention, in order to overcome the defect in the prior art that the speech enhancement parameters are adjusted according to the expert experience, the embodiment of the present invention pre-constructs a speech enhancement parameter set, and automatically selects matched speech enhancement parameters from a speech enhancement parameter set according to the target environment where the speech data to be processed is located, so that not only the speech enhancement effect of the speech data can be improved in any environment, but also the speech recognition accuracy can be maximized. The embodiment of the invention is suitable for voice enhancement processing of voice data, and the execution main body of the embodiment of the invention is a device or equipment capable of performing voice enhancement processing on the voice data, and can be specifically arranged at one side of a client or a server.
Specifically, a segment of voice data of a user in a certain scene is acquired, and before performing voice enhancement processing on the voice data, the voice data needs to be preprocessed, specifically including pre-emphasis processing, framing processing, and windowing function processing, so as to obtain the preprocessed voice data.
102. Extracting a first voice feature corresponding to the voice data, determining a target environment where the voice data is located according to the first voice feature, and selecting a target voice enhancement parameter corresponding to the target environment from a pre-constructed voice enhancement parameter set.
The voice enhancement parameter set comprises voice enhancement parameters under different environments, and the voice enhancement parameters are used for enhancing the voice recognition accuracy under the different environments. For the embodiment of the invention, sample voice data collected under different environments are stored in a preset sample library, in order to determine the environments of the different sample voice data, the sample voice data needs to be clustered to obtain the sample voice data under the different environments, and the sample voice data under the different environments is used for training a voice enhancement model, namely, initial voice enhancement parameters in the voice enhancement model are optimized and adjusted until the sample voice data after the voice enhancement processing is input into a pre-constructed voice recognition model for voice recognition, the voice recognition accuracy of the voice data can reach the highest, so that the voice enhancement parameters under the different environments can be obtained, a voice enhancement parameter set is constructed, when the voice data is in a certain environment, the voice enhancement parameters corresponding to the environment are used for voice enhancement processing on the voice data, and the voice data after the voice enhancement processing is input into a pre-constructed voice enhancement model, so that the voice recognition accuracy of the voice data can reach the highest.
For the embodiment of the present invention, before performing the speech enhancement processing on the speech data, it is necessary to determine the target environment where the speech data to be processed is located, specifically, extract the first speech feature corresponding to the speech data to be processed, simultaneously, respectively extracting second voice characteristics corresponding to the sample voice data under different clustering types (different environments), then calculating characteristic centers corresponding to the sample voice data under different clustering types according to the second voice characteristics corresponding to the sample voice data under different clustering types, because the voice characteristics corresponding to the voice data collected under the same environment are relatively similar, the distance between the first voice characteristic and the centers of different characteristics is calculated, the sample voice data under which cluster type the voice data to be processed is classified is determined, and then the target environment of the voice data to be processed can be determined.
Further, the target enhancement parameters corresponding to the target environment are selected from the pre-constructed voice enhancement parameter set, so that voice enhancement processing is performed on the voice data by using the target voice enhancement parameters, the voice data after the voice enhancement processing is input into the pre-constructed voice recognition model for voice recognition, the voice recognition efficiency of the voice data can be the highest, therefore, the target environment where the voice data is located can be determined according to the voice characteristics of the voice data to be processed, the voice enhancement parameters corresponding to the target environment are automatically selected from the voice enhancement parameter set, the voice enhancement processing is performed on the voice data, the voice enhancement effect is improved, and meanwhile, the voice recognition accuracy of the voice data after the voice enhancement processing can be the highest.
103. And performing voice enhancement processing on the voice data according to the target voice enhancement parameter to obtain voice data after the voice enhancement processing.
For the embodiment of the invention, the voice enhancement processing mainly refers to the noise reduction processing of the voice noise in the voice data to be processed, and can be adopted in the voice enhancement processing processPerforming voice enhancement processing on voice data by using an LMS adaptive filter denoising processing algorithm, and specifically performing the voice enhancement processing by using the LMS adaptive filter denoising processing algorithm, firstly performing silence elimination processing on a voice signal by using a voice endpoint detection algorithm (VAD) to obtain a proper voice spectrum characteristic sequence X ═ X (X)1,x2,…,xn) Then obtaining Y ═ Y (Y) through a multi-channel wiener filtering operation, specifically including a beam forming process1,y2,…,yn) And using Power Spectral Density (PSD) estimation to reduce residual noise component and obtain wiener filter input componentAnd phiV(omega, tau) and then obtaining a post-filter input parameter vector G through wiener filtering calculationWiener(omega, tau), and obtaining a filtering output signal Z (omega, tau) G through post-filter processingWienerAnd (omega, tau) Y, and obtaining the voice data after voice enhancement processing after signal compression or expansion processing, so that the voice data after voice enhancement processing can be adapted to the input form of the voice recognition model.
Compared with the current mode of adjusting the parameters of the voice enhancement module according to the expert experience, the voice enhancement method provided by the embodiment of the invention can acquire the voice data to be processed; simultaneously extracting a first voice feature corresponding to the voice data, determining a target environment where the voice data is located according to the first voice feature, and selecting a target voice enhancement parameter corresponding to the target environment from a pre-constructed voice enhancement parameter set, wherein the voice enhancement parameter set comprises voice enhancement parameters under different environments, and the voice enhancement parameters are used for enhancing the voice recognition accuracy under different environments; and then carrying out voice enhancement processing on the voice data according to the target voice enhancement parameters to obtain the voice data after the voice enhancement processing, so that the target environment where the voice data to be processed is located is determined, the target voice enhancement parameters corresponding to the voice enhancement parameters can be automatically selected from the voice enhancement parameter set, and the voice data is subjected to voice enhancement processing by using the target voice enhancement parameters, so that the voice enhancement effect in the target environment can be improved, and meanwhile, the highest accuracy of voice recognition in the target environment can be ensured.
Further, in order to better explain the above process of performing speech enhancement processing on speech data, as a refinement and extension to the above embodiment, another speech enhancement method is provided in an embodiment of the present invention, as shown in fig. 2, and the method includes:
201. and acquiring voice data to be processed.
For the embodiment of the present invention, in order to automatically select the speech enhancement parameters matched with the environment according to the environment where the speech data to be processed is located, and to make the speech recognition accuracy of the speech data reach the highest, the speech enhancement parameters under different environments need to be constructed in advance, based on which the method includes: carrying out voice enhancement processing on the sample voice data under different environments by using the initial voice enhancement parameters to obtain sample voice data subjected to voice enhancement processing under different environments; constructing voice recognition accuracy functions under different environments according to the sample voice data; and optimizing and adjusting the initial voice enhancement parameters according to the accuracy function to obtain voice enhancement parameters under different environments, and constructing the voice enhancement parameter set based on the voice enhancement parameters under the different environments. Further, the constructing the speech recognition accuracy functions under different environments according to the sample speech data includes: carrying out voice recognition on the sample voice data subjected to the voice enhancement processing by utilizing a pre-constructed voice recognition model to obtain voice recognition results under different environments; and constructing a voice recognition accuracy function under different environments according to the voice recognition results under different environments. The pre-constructed speech recognition model may be a neural network speech recognition model.
For example, an initial speech enhancement is given, then the initial speech enhancement parameter is used to perform speech enhancement processing on sample speech data in a factory environment to obtain sample speech data after the speech enhancement processing in the factory environment, and the sample speech data after the speech enhancement processing is input to a pre-constructed speech recognition model to perform speech recognition processing to obtain a speech recognition result corresponding to the sample speech data in the factory environment, then a speech recognition accuracy function in the factory environment is constructed according to the speech recognition result in the factory environment, the function is solved under the condition of the highest speech recognition accuracy, when an optimal solution is specifically searched, a genetic algorithm can be used to search speech enhancement parameters in different environments, and the specific formula is as follows:
θi=argmaxT(θ)
wherein T (theta) is the speech recognition accuracy in a factory environment, thetaiFor the voice enhancement parameters in the factory environment, the voice enhancement parameters theta can be obtained by continuously optimizing and adjusting the initial voice enhancement parametersiSpeech enhancement parameter θiThe speech recognition accuracy under the factory environment can be maximized, so that the speech enhancement parameters under different environments can be obtained according to the method, and the speech enhancement parameter set { theta is constructediAnd then the voice recognition accuracy under different environments reaches the highest.
For the embodiment of the invention, after the voice enhancement parameter set is constructed, the voice data to be processed can be obtained, and the corresponding voice enhancement parameters are selected from the voice enhancement parameter set for voice enhancement processing by determining the target environment of the voice data to be processed.
202. Extracting a first voice feature corresponding to the voice data, determining a target environment where the voice data is located according to the first voice feature, and selecting a target voice enhancement parameter corresponding to the target environment from a pre-constructed voice enhancement parameter set.
The voice enhancement parameter set comprises voice enhancement parameters under different environments, and the voice enhancement parameters are used for maximizing voice recognition accuracy under different environments. For the embodiment of the present invention, in order to determine the target environment where the voice data to be processed is located, step 202 specifically includes: acquiring sample voice data under different environments, and extracting second voice features corresponding to the sample voice data; calculating a feature center corresponding to the sample voice data in different environments according to the second voice feature; and determining the target environment in which the voice data is positioned according to the feature center and the first voice feature. Further, the determining the target environment in which the voice data is located according to the feature center and the first voice feature includes: calculating Euclidean distances between the first voice feature and different feature centers by using a preset Euclidean distance algorithm; and screening out the minimum Euclidean distance from the calculated Euclidean distances, and determining the environment where the sample voice data corresponding to the minimum Euclidean distance is located as the target environment. When extracting the voice features corresponding to the voice data to be processed and the sample voice data, a preset mel cepstrum algorithm can be adopted to calculate mel cepstrum coefficients corresponding to the sample data to be processed and the sample voice data respectively, and the calculated mel cepstrum coefficients are determined as the voice features corresponding to the voice data to be processed and the sample voice data respectively.
For example, the feature center corresponding to the sample voice data at the street is calculated to be a, the feature center corresponding to the sample voice data in the factory environment is calculated to be B, the feature center corresponding to the sample voice data in the airport environment is calculated to be C, because the voice characteristics corresponding to the voice data in the same environment are relatively similar, and then the Euclidean distances between the first voice characteristic corresponding to the voice data to be processed and the characteristic center A, the characteristic center B and the characteristic center C are respectively calculated, and selecting the minimum Euclidean distance from the calculated Euclidean distances, if the Euclidean distance between the feature center B and the first voice feature is determined to be minimum, it is determined that the voice data to be processed is relatively similar to the sample voice data in the factory environment, and thus it is determined that the voice data to be processed is in the factory environment, and thus the target environment in which the voice data to be processed is located can be determined in the above manner.
203. And performing voice enhancement processing on the voice data according to the target voice enhancement parameter to obtain voice data after the voice enhancement processing.
For the embodiment of the present invention, in order to perform speech enhancement processing on speech data, step 203 specifically includes: and according to the target filtering noise reduction parameter, carrying out filtering noise reduction processing on the voice data to obtain noise-reduced voice data. Specifically, the method of performing noise reduction processing on the speech data by using the target filtering noise reduction parameter is completely the same as that in step 103, and is not described herein again.
204. And performing feature extraction on the voice data after the voice enhancement processing to obtain a third voice feature corresponding to the voice data, and determining a voice recognition result corresponding to the voice data according to the third voice feature.
For the embodiment, after the voice enhancement processing is performed on the voice data, voice recognition needs to be further performed on the voice data after the voice enhancement processing, specifically, when the voice recognition is performed on the voice data, voice recognition can be performed by using a pre-established voice recognition model, the voice recognition model can be specifically a neural network voice recognition model, specifically, the voice data after the voice enhancement processing is input into the voice recognition model, a hidden layer in the voice recognition model can extract a third voice feature corresponding to the voice data, and voice recognition is performed according to the third voice feature, so that a voice recognition result corresponding to the voice data is obtained, and at this time, the accuracy of the voice recognition result can reach the highest.
Compared with the current mode of adjusting the parameters of the voice enhancement module according to the expert experience, the other voice enhancement method provided by the embodiment of the invention can acquire the voice data to be processed; simultaneously extracting a first voice feature corresponding to the voice data, determining a target environment where the voice data is located according to the first voice feature, and selecting a target voice enhancement parameter corresponding to the target environment from a pre-constructed voice enhancement parameter set, wherein the voice enhancement parameter set comprises voice enhancement parameters under different environments, and the voice enhancement parameters are used for enhancing the voice recognition accuracy under different environments; and then carrying out voice enhancement processing on the voice data according to the target voice enhancement parameters to obtain the voice data after the voice enhancement processing, so that the target environment where the voice data to be processed is located is determined, the target voice enhancement parameters corresponding to the voice enhancement parameters can be automatically selected from the voice enhancement parameter set, and the voice data is subjected to voice enhancement processing by using the target voice enhancement parameters, so that the voice enhancement effect in the target environment can be improved, and meanwhile, the highest accuracy of voice recognition in the target environment can be ensured.
Further, as a specific implementation of fig. 1, an embodiment of the present invention provides a speech enhancement apparatus, as shown in fig. 3, the apparatus includes: an acquisition unit 31, a selection unit 32 and a processing unit 33.
The acquiring unit 31 may be configured to acquire voice data to be processed. The acquiring unit 31 is a main functional module in the present apparatus for acquiring voice data to be processed.
The selecting unit 32 may be configured to extract a first voice feature corresponding to the voice data, determine a target environment in which the voice data is located according to the first voice feature, and select a target voice enhancement parameter corresponding to the target environment from a pre-established voice enhancement parameter set, where the voice enhancement parameter set includes voice enhancement parameters in different environments, and the voice enhancement parameters are used to maximize voice recognition accuracy in different environments. The selecting unit 32 is a main function module, which is also a core module, that extracts a first voice feature corresponding to the voice data in the device, determines a target environment where the voice data is located according to the first voice feature, and selects a target voice enhancement parameter corresponding to the target environment from a pre-established voice enhancement parameter set.
The processing unit 33 may be configured to perform speech enhancement processing on the speech data according to the target speech enhancement parameter, so as to obtain speech data after the speech enhancement processing. The processing unit 33 is a main functional module of the device that performs speech enhancement processing on the speech data according to the target speech enhancement parameter to obtain speech data after the speech enhancement processing.
Further, in order to determine the target environment where the voice data is located, as shown in fig. 4, the selecting unit 32 includes an extracting module 321, a calculating module 322, and a determining module 323.
The extracting module 321 may be configured to obtain sample voice data in different environments, and extract a second voice feature corresponding to the sample voice data.
The calculating module 322 may be configured to calculate, according to the second speech feature, a feature center corresponding to the sample speech data in the different environments.
The determining module 323 may be configured to determine a target environment in which the voice data is located according to the feature center and the first voice feature.
Further, in order to determine the target environment where the voice data is located, the determining module 323 includes: a calculation submodule and a determination submodule.
The calculating submodule can be configured to calculate the euclidean distances between the first speech feature and different feature centers by using a preset euclidean distance algorithm.
The determining submodule may be configured to screen a minimum euclidean distance from the calculated euclidean distances, and determine an environment where the sample voice data corresponding to the minimum euclidean distance is located as the target environment.
Further, to construct a set of speech enhancement parameters, the apparatus further comprises: a unit 34 is constructed.
The processing unit 33 may further be configured to perform speech enhancement processing on the sample speech data in different environments by using the initial speech enhancement parameter, so as to obtain sample speech data after the speech enhancement processing in different environments.
The constructing unit 34 may be configured to construct speech recognition accuracy functions under different environments according to the sample speech data.
The constructing unit 34 may be further configured to optimize and adjust the initial speech enhancement parameter according to the accuracy function to obtain speech enhancement parameters under different environments, and construct the speech enhancement parameter set based on the speech enhancement parameters under different environments.
Further, in order to construct the speech recognition accuracy function under different environments, the constructing unit 34 includes: an identification module 341 and a construction module 342.
The recognition module 341 may be configured to perform speech recognition on the sample speech data after the speech enhancement processing by using a pre-established speech recognition model, so as to obtain speech recognition results in different environments.
The constructing module 342 may be configured to construct a speech recognition accuracy function under different environments according to the speech recognition results under different environments.
Further, in order to perform voice recognition on voice data, the apparatus further includes: an extraction unit 35 and a determination unit 36.
The extracting unit 35 may be configured to perform feature extraction on the voice data after the voice enhancement processing, so as to obtain a third voice feature corresponding to the voice data.
The determining unit 36 may be configured to determine a speech recognition result corresponding to the speech data according to the third speech feature.
Further, in order to perform speech enhancement processing on the speech data, the processing unit 33 may be specifically configured to perform filtering and noise reduction processing on the speech data according to the target filtering and noise reduction parameter, so as to obtain noise-reduced speech data.
It should be noted that other corresponding descriptions of the functional modules related to the speech enhancement device provided in the embodiment of the present invention may refer to the corresponding description of the method shown in fig. 1, and are not described herein again.
Based on the method shown in fig. 1, correspondingly, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the following steps: acquiring voice data to be processed; extracting a first voice feature corresponding to the voice data, determining a target environment where the voice data is located according to the first voice feature, and selecting a target voice enhancement parameter corresponding to the target environment from a pre-constructed voice enhancement parameter set, wherein the voice enhancement parameter set comprises voice enhancement parameters under different environments, and the voice enhancement parameters are used for enhancing the voice recognition accuracy under different environments; and performing voice enhancement processing on the voice data according to the target voice enhancement parameter to obtain voice data after the voice enhancement processing.
Based on the above embodiments of the method shown in fig. 1 and the apparatus shown in fig. 3, an embodiment of the present invention further provides an entity structure diagram of a computer device, as shown in fig. 5, where the computer device includes: a processor 41, a memory 42, and a computer program stored on the memory 42 and executable on the processor, wherein the memory 42 and the processor 41 are both arranged on a bus 43 such that when the processor 41 executes the program, the following steps are performed: acquiring voice data to be processed; extracting a first voice feature corresponding to the voice data, determining a target environment where the voice data is located according to the first voice feature, and selecting a target voice enhancement parameter corresponding to the target environment from a pre-constructed voice enhancement parameter set, wherein the voice enhancement parameter set comprises voice enhancement parameters under different environments, and the voice enhancement parameters are used for enhancing the voice recognition accuracy under different environments; and performing voice enhancement processing on the voice data according to the target voice enhancement parameter to obtain voice data after the voice enhancement processing.
By the technical scheme, the voice processing method and the voice processing device can acquire the voice data to be processed; simultaneously extracting a first voice feature corresponding to the voice data, determining a target environment where the voice data is located according to the first voice feature, and selecting a target voice enhancement parameter corresponding to the target environment from a pre-constructed voice enhancement parameter set, wherein the voice enhancement parameter set comprises voice enhancement parameters under different environments, and the voice enhancement parameters are used for enhancing the voice recognition accuracy under different environments; and then carrying out voice enhancement processing on the voice data according to the target voice enhancement parameters to obtain the voice data after the voice enhancement processing, so that the target environment where the voice data to be processed is located is determined, the target voice enhancement parameters corresponding to the voice enhancement parameters can be automatically selected from the voice enhancement parameter set, and the voice data is subjected to voice enhancement processing by using the target voice enhancement parameters, so that the voice enhancement effect in the target environment can be improved, and meanwhile, the highest accuracy of voice recognition in the target environment can be ensured.
It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (10)
1. A method of speech enhancement, comprising:
acquiring voice data to be processed;
extracting a first voice feature corresponding to the voice data, determining a target environment where the voice data is located according to the first voice feature, and selecting a target voice enhancement parameter corresponding to the target environment from a pre-constructed voice enhancement parameter set, wherein the voice enhancement parameter set comprises voice enhancement parameters under different environments, and the voice enhancement parameters are used for enhancing the voice recognition accuracy under different environments;
and performing voice enhancement processing on the voice data according to the target voice enhancement parameter to obtain voice data after the voice enhancement processing.
2. The method of claim 1, wherein determining the target environment in which the voice data is located based on the first voice feature comprises:
acquiring sample voice data under different environments, and extracting second voice features corresponding to the sample voice data;
calculating a feature center corresponding to the sample voice data in different environments according to the second voice feature;
and determining the target environment in which the voice data is positioned according to the feature center and the first voice feature.
3. The method of claim 2, wherein determining the target environment in which the speech data is located based on the feature center and the first speech feature comprises:
calculating Euclidean distances between the first voice feature and different feature centers by using a preset Euclidean distance algorithm;
and screening out the minimum Euclidean distance from the calculated Euclidean distances, and determining the environment where the sample voice data corresponding to the minimum Euclidean distance is located as the target environment.
4. The method of claim 1, wherein prior to said obtaining voice data to be processed, the method comprises:
carrying out voice enhancement processing on the sample voice data under different environments by using the initial voice enhancement parameters to obtain sample voice data subjected to voice enhancement processing under different environments;
constructing voice recognition accuracy functions under different environments according to the sample voice data;
and optimizing and adjusting the initial voice enhancement parameters according to the accuracy function to obtain voice enhancement parameters under different environments, and constructing the voice enhancement parameter set based on the voice enhancement parameters under the different environments.
5. The method of claim 4, wherein constructing the speech recognition accuracy function for different environments from the sample speech data comprises:
carrying out voice recognition on the sample voice data subjected to the voice enhancement processing by utilizing a pre-constructed voice recognition model to obtain voice recognition results under different environments;
and constructing a voice recognition accuracy function under different environments according to the voice recognition results under different environments.
6. The method according to claim 1, wherein after performing the speech enhancement processing on the speech data according to the target speech enhancement parameter to obtain speech-enhanced speech data, the method further comprises:
performing feature extraction on the voice data after the voice enhancement processing to obtain a third voice feature corresponding to the voice data;
and determining a voice recognition result corresponding to the voice data according to the third voice characteristic.
7. The method according to any one of claims 1 to 6, wherein the target speech enhancement parameter is a target filtering noise reduction parameter, and performing speech enhancement processing on the speech data according to the target speech enhancement parameter to obtain speech data after speech enhancement processing comprises:
and according to the target filtering noise reduction parameter, carrying out filtering noise reduction processing on the voice data to obtain noise-reduced voice data.
8. A speech enhancement apparatus, comprising:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring voice data to be processed;
the selecting unit is used for extracting a first voice feature corresponding to the voice data, determining a target environment where the voice data is located according to the first voice feature, and selecting a target voice enhancement parameter corresponding to the target environment from a pre-constructed voice enhancement parameter set, wherein the voice enhancement parameter set comprises voice enhancement parameters under different environments, and the voice enhancement parameters are used for enhancing the voice recognition accuracy under different environments;
and the processing unit is used for carrying out voice enhancement processing on the voice data according to the target voice enhancement parameter to obtain the voice data after the voice enhancement processing.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
10. A computer arrangement comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the computer program realizes the steps of the method of any one of claims 1 to 7 when executed by the processor.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011153521.2A CN112151052B (en) | 2020-10-26 | 2020-10-26 | Speech enhancement method, device, computer equipment and storage medium |
PCT/CN2020/136364 WO2021189979A1 (en) | 2020-10-26 | 2020-12-15 | Speech enhancement method and apparatus, computer device, and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011153521.2A CN112151052B (en) | 2020-10-26 | 2020-10-26 | Speech enhancement method, device, computer equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112151052A true CN112151052A (en) | 2020-12-29 |
CN112151052B CN112151052B (en) | 2024-06-25 |
Family
ID=73955013
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011153521.2A Active CN112151052B (en) | 2020-10-26 | 2020-10-26 | Speech enhancement method, device, computer equipment and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN112151052B (en) |
WO (1) | WO2021189979A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113539262A (en) * | 2021-07-09 | 2021-10-22 | 广东金鸿星智能科技有限公司 | Sound enhancement and recording method and system for voice control of electrically operated gate |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114512136B (en) * | 2022-03-18 | 2023-09-26 | 北京百度网讯科技有限公司 | Model training method, audio processing method, device, equipment, storage medium and program |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2013037177A (en) * | 2011-08-08 | 2013-02-21 | Nippon Telegr & Teleph Corp <Ntt> | Speech enhancement device, and method and program thereof |
CN103456305A (en) * | 2013-09-16 | 2013-12-18 | 东莞宇龙通信科技有限公司 | Terminal and speech processing method based on multiple sound collecting units |
CN104575509A (en) * | 2014-12-29 | 2015-04-29 | 乐视致新电子科技(天津)有限公司 | Voice enhancement processing method and device |
KR20190037867A (en) * | 2017-09-29 | 2019-04-08 | 주식회사 케이티 | Device, method and computer program for removing noise from noisy speech data |
CN110503974A (en) * | 2019-08-29 | 2019-11-26 | 泰康保险集团股份有限公司 | Fight audio recognition method, device, equipment and computer readable storage medium |
CN111698629A (en) * | 2019-03-15 | 2020-09-22 | 北京小鸟听听科技有限公司 | Calibration method and apparatus for audio playback device, and computer storage medium |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8082148B2 (en) * | 2008-04-24 | 2011-12-20 | Nuance Communications, Inc. | Testing a grammar used in speech recognition for reliability in a plurality of operating environments having different background noise |
CN101593522B (en) * | 2009-07-08 | 2011-09-14 | 清华大学 | Method and equipment for full frequency domain digital hearing aid |
CN101710490B (en) * | 2009-11-20 | 2012-01-04 | 安徽科大讯飞信息科技股份有限公司 | Method and device for compensating noise for voice assessment |
CN110473568B (en) * | 2019-08-08 | 2022-01-07 | Oppo广东移动通信有限公司 | Scene recognition method and device, storage medium and electronic equipment |
CN110648680B (en) * | 2019-09-23 | 2024-05-14 | 腾讯科技(深圳)有限公司 | Voice data processing method and device, electronic equipment and readable storage medium |
-
2020
- 2020-10-26 CN CN202011153521.2A patent/CN112151052B/en active Active
- 2020-12-15 WO PCT/CN2020/136364 patent/WO2021189979A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2013037177A (en) * | 2011-08-08 | 2013-02-21 | Nippon Telegr & Teleph Corp <Ntt> | Speech enhancement device, and method and program thereof |
CN103456305A (en) * | 2013-09-16 | 2013-12-18 | 东莞宇龙通信科技有限公司 | Terminal and speech processing method based on multiple sound collecting units |
CN104575509A (en) * | 2014-12-29 | 2015-04-29 | 乐视致新电子科技(天津)有限公司 | Voice enhancement processing method and device |
KR20190037867A (en) * | 2017-09-29 | 2019-04-08 | 주식회사 케이티 | Device, method and computer program for removing noise from noisy speech data |
CN111698629A (en) * | 2019-03-15 | 2020-09-22 | 北京小鸟听听科技有限公司 | Calibration method and apparatus for audio playback device, and computer storage medium |
CN110503974A (en) * | 2019-08-29 | 2019-11-26 | 泰康保险集团股份有限公司 | Fight audio recognition method, device, equipment and computer readable storage medium |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113539262A (en) * | 2021-07-09 | 2021-10-22 | 广东金鸿星智能科技有限公司 | Sound enhancement and recording method and system for voice control of electrically operated gate |
CN113539262B (en) * | 2021-07-09 | 2023-08-22 | 广东金鸿星智能科技有限公司 | Sound enhancement and recording method and system for voice control of electric door |
Also Published As
Publication number | Publication date |
---|---|
CN112151052B (en) | 2024-06-25 |
WO2021189979A1 (en) | 2021-09-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110600017B (en) | Training method of voice processing model, voice recognition method, system and device | |
CN108281146B (en) | Short voice speaker identification method and device | |
CN109599109B (en) | Confrontation audio generation method and system for white-box scene | |
Zhou et al. | A compact representation of visual speech data using latent variables | |
CN108922544B (en) | Universal vector training method, voice clustering method, device, equipment and medium | |
WO2018005858A1 (en) | Speech recognition | |
CN110556103A (en) | Audio signal processing method, apparatus, system, device and storage medium | |
CN108597505B (en) | Voice recognition method and device and terminal equipment | |
CN109427328B (en) | Multichannel voice recognition method based on filter network acoustic model | |
CN111785288B (en) | Voice enhancement method, device, equipment and storage medium | |
CN111242005B (en) | Heart sound classification method based on improved wolf's swarm optimization support vector machine | |
CN112151052B (en) | Speech enhancement method, device, computer equipment and storage medium | |
CN110211599A (en) | Using awakening method, device, storage medium and electronic equipment | |
CN109147798B (en) | Speech recognition method, device, electronic equipment and readable storage medium | |
CN113205803A (en) | Voice recognition method and device with adaptive noise reduction capability | |
CN113628612A (en) | Voice recognition method and device, electronic equipment and computer readable storage medium | |
CN111489763A (en) | Adaptive method for speaker recognition in complex environment based on GMM model | |
CN113077779A (en) | Noise reduction method and device, electronic equipment and storage medium | |
CN117496998A (en) | Audio classification method, device and storage medium | |
CN112489678B (en) | Scene recognition method and device based on channel characteristics | |
CN114220430A (en) | Multi-sound-zone voice interaction method, device, equipment and storage medium | |
CN114023336A (en) | Model training method, device, equipment and storage medium | |
CN114495903A (en) | Language category identification method and device, electronic equipment and storage medium | |
CN112201270B (en) | Voice noise processing method and device, computer equipment and storage medium | |
Kumar et al. | Improving the performance of speech recognition feature selection using northern goshawk optimization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |