WO2021189981A1 - Voice noise processing method and apparatus, and computer device and storage medium - Google Patents

Voice noise processing method and apparatus, and computer device and storage medium Download PDF

Info

Publication number
WO2021189981A1
WO2021189981A1 PCT/CN2020/136367 CN2020136367W WO2021189981A1 WO 2021189981 A1 WO2021189981 A1 WO 2021189981A1 CN 2020136367 W CN2020136367 W CN 2020136367W WO 2021189981 A1 WO2021189981 A1 WO 2021189981A1
Authority
WO
WIPO (PCT)
Prior art keywords
noise
voice
speech
classification model
sequences
Prior art date
Application number
PCT/CN2020/136367
Other languages
French (fr)
Chinese (zh)
Inventor
罗剑
王健宗
程宁
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021189981A1 publication Critical patent/WO2021189981A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/45Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T90/00Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation

Definitions

  • This application relates to the field of artificial intelligence technology, and in particular to a method, device, computer equipment, and storage medium for processing speech noise.
  • the voice noise is usually recognized first, and after the voice noise is recognized, a unified noise reduction processing method is used to process the voice noise.
  • this method cannot identify the types of speech noise.
  • the types of speech noise in different scenarios are different. If the same noise reduction processing method is used to process the speech noise in different scenarios, The noise reduction effect that can be achieved is limited, that is, the optimal noise reduction effect cannot be achieved in different scenarios.
  • This application provides a method, device, computer equipment, and storage medium for processing speech noise, mainly in that it can identify the types of speech noise in different scenarios, and adopt an appropriate noise reduction processing method according to the recognized noise type. Perform processing to achieve the optimal noise reduction processing effect.
  • a method for processing speech noise including:
  • the voice sequence contains voice noise, use a preset noise classification model to determine the noise category corresponding to the voice noise, wherein the noise classification model is generated from multiple noises.
  • the types of speech noise generated by different noise generation models are different when the models are jointly trained;
  • an optimal noise reduction processing strategy corresponding to the speech noise is determined, and the optimal noise reduction processing strategy is used to perform noise reduction processing on the speech noise.
  • a speech noise processing device including:
  • the acquiring unit is used to acquire the voice sequence to be recognized
  • the determining unit is configured to perform noise recognition on the voice sequence, and if the voice sequence contains voice noise, use a preset noise classification model to determine the noise category corresponding to the voice noise, wherein the noise classification model is It is obtained by joint training with multiple noise generation models, and the types of speech noise generated by different noise generation models are different;
  • the noise reduction unit is configured to determine an optimal noise reduction processing strategy corresponding to the speech noise based on the noise category, and use the optimal noise reduction processing strategy to perform noise reduction processing on the speech noise.
  • a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the steps of a method for processing speech noise are realized:
  • the voice sequence contains voice noise, use a preset noise classification model to determine the noise category corresponding to the voice noise, wherein the noise classification model is generated from multiple noises.
  • the types of speech noise generated by different noise generation models are different when the models are jointly trained;
  • an optimal noise reduction processing strategy corresponding to the speech noise is determined, and the optimal noise reduction processing strategy is used to perform noise reduction processing on the speech noise.
  • a computer device including a memory, a processor, and a computer program stored in the memory and capable of running on the processor.
  • the processor implements a voice noise when the program is executed. The steps of the processing method:
  • the voice sequence contains voice noise, use a preset noise classification model to determine the noise category corresponding to the voice noise, wherein the noise classification model is generated from multiple noises.
  • the types of speech noise generated by different noise generation models are different when the models are jointly trained;
  • an optimal noise reduction processing strategy corresponding to the speech noise is determined, and the optimal noise reduction processing strategy is used to perform noise reduction processing on the speech noise.
  • the speech noise processing method, device, computer equipment, and storage medium provided in this application are compared with the current method of using the same noise reduction strategy for noise reduction processing for different types of speech noise.
  • This application can obtain the to-be-identified The voice sequence; and noise recognition is performed on the voice sequence.
  • a preset noise classification model is used to determine the noise category corresponding to the voice noise, wherein the noise classification model is It is obtained by joint training with multiple noise generation models, and the types of speech noise generated by different noise generation models are different; at the same time, based on the noise category, determine the optimal noise reduction processing strategy corresponding to the speech noise, and use
  • the optimal noise reduction processing strategy performs noise reduction processing on the speech noise, so that the noise classification model and multiple noise generation models are jointly trained, so that the noise classification model in this application can be used for speech in different scenarios.
  • the type of noise is identified, and then the optimal noise reduction processing strategy can be selected to process the speech noise according to the determined noise category, and the optimal noise reduction processing effect can be achieved.
  • Fig. 1 shows a flowchart of a method for processing speech noise provided by an embodiment of the present application
  • FIG. 2 shows a flowchart of another method for processing voice noise according to an embodiment of the present application
  • FIG. 3 shows a schematic structural diagram of a speech noise processing apparatus provided by an embodiment of the present application
  • FIG. 4 shows a schematic structural diagram of another apparatus for processing speech noise according to an embodiment of the present application
  • Fig. 5 shows a schematic diagram of the physical structure of a computer device provided by an embodiment of the present application.
  • the voice noise is usually recognized first, and after the voice noise is recognized, a unified noise reduction processing method is used to process the voice noise.
  • this method cannot identify the types of speech noise.
  • the types of speech noise in different scenarios are different. If the same noise reduction processing method is used to process the speech noise in different scenarios, the reduction that can be achieved can be achieved.
  • the noise effect is limited, that is, the optimal noise reduction effect cannot be achieved in different scenes.
  • an embodiment of the present application provides a method for processing speech noise. As shown in FIG. 1, the method includes:
  • the voice sequence to be recognized is a user voice sequence obtained from a certain scene.
  • the voice sequence to be recognized is a user voice sequence collected on the side of a street, or a user voice sequence collected from a factory.
  • the voice sequence may or may not contain voice noise.
  • an appropriate noise reduction strategy can be selected according to the type of speech noise to process the speech noise in order to achieve the optimal
  • the embodiments of the present application are mainly applicable to the processing of speech noise.
  • the execution subject of the embodiments of the present application is a device or device capable of processing speech noise, which can be set on the client or server side.
  • the preprocessed speech sequence is obtained, and the preprocessed speech sequence is used as the speech sequence to be recognized, so as to determine whether the speech sequence to be recognized contains speech noise, if the speech sequence to be recognized does not contain speech If there is noise, the speech sequence to be recognized is directly recognized; if the speech sequence to be recognized contains speech noise, it is necessary to further determine the type of noise required, so as to select the appropriate noise reduction process according to the determined type of speech noise Strategies for noise reduction processing, so as to achieve the best noise reduction effect.
  • the noise classification model is obtained through joint training with multiple noise generation models.
  • the types of speech noise generated by different noise generation models are different.
  • the types of speech noise in different scenarios are different, for example, collected on the side of the street.
  • the type of speech noise is different from the type of speech noise collected in the factory.
  • the speech sequence to be recognized is input into a preset noise recognition model for noise recognition
  • the preset noise recognition model may specifically be a first preset neural network model.
  • the hidden layer in the first preset neural network model extracts the to-be-recognized According to the voice features corresponding to the voice sequence, it is determined whether the voice sequence to be recognized contains voice noise according to the extracted voice features. If the voice sequence to be recognized does not contain voice noise, then the extracted voice feature is directly subjected to voice recognition; When the recognized speech sequence contains speech noise, the extracted speech features are input into a preset noise classification model for noise classification.
  • the noise classification model may be a second preset neural network model.
  • the hidden layer in the second preset neural network model extracts the noise features corresponding to the voice noise, and then determines the noise type corresponding to the voice noise contained in the speech sequence to be recognized according to the extracted noise feature, so as to select according to the determined noise type
  • a suitable noise reduction processing strategy performs noise reduction processing on the speech sequence to be recognized to achieve the optimal noise reduction effect in the scene.
  • an optimal noise reduction processing strategy corresponding to the speech noise is determined, and the optimal noise reduction processing strategy is used to perform noise reduction processing on the speech noise.
  • different types of speech noise are suitable for different optimal noise reduction processing strategies.
  • speech noise from the side of the street because the noise on the side of the street is relatively random and the noise has a wide spectrum range, it can be used.
  • Adaptive filter for noise reduction for the speech noise from the factory, since most of the speech noise in the factory is machine processing noise in the workshop, the randomness of the noise is small, and the noise spectrum range is narrow, so adaptive trapping can be used.
  • the wave generator performs noise reduction processing.
  • the noise reduction processing strategy corresponding to the noise category from the preset noise reduction strategy library, and determine it as the optimal reduction Noise processing strategy, and then use the optimal noise reduction processing strategy to reduce the noise in the speech noise in the speech sequence to be recognized, so that the optimal noise reduction processing effect can be achieved for the speech noise in different scenarios, avoiding the use of uniform Noise reduction processing strategy, noise reduction processing effect of image speech noise.
  • the method for processing speech noise provided by the embodiment of the present application is compared with the current manner in which the same noise reduction strategy is used for noise reduction processing for different types of speech noise, the present application can obtain the voice sequence to be recognized; and Noise recognition is performed on the speech sequence, and if the speech sequence contains speech noise, a preset noise classification model is used to determine the noise category corresponding to the speech noise, wherein the noise classification model is related to multiple noise generation models
  • the types of speech noise generated by different noise generation models are different from the joint training; at the same time, based on the noise category, the optimal noise reduction processing strategy corresponding to the speech noise is determined, and the optimal noise reduction is used
  • the processing strategy performs noise reduction processing on the speech noise, so that the noise classification model and multiple noise generation models are jointly trained, so that the noise classification model in this application can recognize the types of speech noise in different scenarios. Furthermore, according to the determined noise category, the optimal noise reduction processing strategy can be selected to process the speech noise, and the optimal noise reduction processing effect can be achieved.
  • an embodiment of the present application provides another method for processing speech noise. As shown in FIG. 2, the method include:
  • multiple random speech sequences can obey Gaussian distribution.
  • the real speech sequence is the real speech sequence of the user collected in different scenes.
  • the real speech sequence is processed by noise reduction, and there is no noise, and the speech recognition can be directly performed.
  • a noise recognition model and a noise classification model are constructed respectively to achieve the purpose of recognizing and classifying speech noise.
  • the real voice sequence of the user in the preset sample library is obtained.
  • the real voice sequence comes from different scenarios.
  • step 201 specifically includes: calculating the Euclidean distance between different real speech sequences according to the preset Euclidean distance algorithm; based on the Euclidean distance, The real speech sequence is clustered to obtain real speech sequences in different clustering categories.
  • the voice sequences in the preset sample library are clustered to obtain the real voice sequences under different clustering categories, and the scenes corresponding to the real voice sequences under different clustering categories are determined , And then be able to determine the real voice sequence in different scenarios.
  • the Euclidean distance between different real speech sequences is calculated according to the preset Euclidean distance algorithm, and the real speech sequence is clustered according to the calculated Euclidean distance to obtain real speech sequences in different clustering categories, and then extract The voice features corresponding to the real voice sequences under different clustering categories are determined, and the scenes corresponding to the real voice sequences under different clustering categories are determined. For example, it is determined that the real voice sequences 1-10 are the voice sequences collected on the street, and the voice sequences 11- 20 is the voice sequence collected in the factory, which can determine the real voice sequence in different scenarios.
  • step 202 specifically includes: constructing an initial noise classification model and multiple initial noise generation models respectively; For real speech sequences in a class category, joint iterative training is performed on the initial noise classification model and the multiple initial noise generation models to construct the noise classification model and the multiple noise generation models. Further, in order to be able to recognize speech noise, it is also necessary to construct a noise recognition model, which separately constructs an initial noise classification model and multiple initial noise generation models, including: separately constructing an initial noise recognition model, an initial noise classification model, and multiple initial noise generation models. The initial noise generation model.
  • the noise classification model and the multiple initial noise generation models are jointly iteratively trained according to the multiple random voice sequences and the real voice sequences in the different clustering categories to construct the
  • the noise classification model and the multiple noise generation models include: respectively inputting the multiple random speech sequences into the multiple initial noise generation models to generate different types of speech noise; and combining the generated speech noise and the real noise
  • the speech sequences are respectively input to the initial noise and noise recognition model for noise recognition, and the initial noise recognition result is obtained; the speech feature corresponding to the speech noise in the initial noise recognition result is extracted, and it is input to the initial noise classification model for noise classification, Obtain the initial noise classification result; based on the initial noise recognition result and the initial noise classification result, construct the noise recognition accuracy loss function and noise classification accuracy loss function respectively; according to the noise recognition accuracy loss function and noise classification accuracy loss Function to perform joint iterative training on the initial noise recognition model, the initial noise classification model, and the multiple initial noise generation models to construct the noise recognition model, the noise classification model, and the multiple noise generation models respectively.
  • the initial noise recognition results are obtained, and then the speech features corresponding to the speech noise in the initial recognition results are extracted.
  • the noise recognition accuracy loss function and the noise classification accuracy loss function are constructed respectively.
  • Lc is the noise recognition accuracy loss function
  • Lc is the noise classification accuracy loss Korean style
  • zi is speech noise
  • xi is the real speech sequence
  • D stands for the preset noise recognition model
  • G stands for the preset noise generation model
  • c stands for Noise classification model, in order to ensure that the speech noise generated by the noise generation model is closer to the real speech sequence, and increase the difficulty of the recognition of the noise recognition model.
  • the optimization direction of the noise generation model and the noise recognition model is opposite, that is, the noise generation model needs to be minimized
  • the accuracy of the noise recognition model is preset, so its optimization direction is to minimize Lc-Ls, and the training purpose of the noise classification model is to maximize the accuracy of the classification noise, so its optimization direction is to maximize Lc+Ls, so the optimization direction is to maximize Lc+Ls.
  • the above two optimization equations can continuously train the initial noise generation model, the initial noise recognition model and the initial noise classification model to construct the noise generation model, the noise recognition model and the noise classification model.
  • the voice sequence to be recognized is a user voice sequence obtained from a certain scene.
  • the voice sequence may or may not contain voice noise.
  • the voice sequence to be recognized contains For speech noise, it is necessary to reduce the noise of speech noise.
  • the types of speech noise can be further identified so as to be based on the type of speech noise. Select the appropriate noise reduction processing strategy for the type of noise reduction.
  • a preset noise classification model is used to determine the noise category corresponding to the voice noise.
  • step 204 specifically includes: performing speech feature extraction on the speech sequence to obtain the speech feature corresponding to the speech sequence; and judging the voice based on the speech feature Whether the sequence contains speech noise; if it contains speech noise, based on the extracted speech features, the noise classification model is used to determine the noise category corresponding to the speech noise.
  • the voice sequence to be recognized is input to the noise recognition model for noise recognition.
  • the hidden layer in the preset noise recognition model will extract the voice features corresponding to the voice sequence to be recognized, based on the extracted voice
  • the feature determines whether the speech sequence to be recognized contains speech noise, and if it contains speech noise, the extracted speech feature is input into the noise classification model for noise classification to determine the noise category corresponding to the speech noise.
  • the noise reduction processing strategy corresponding to the noise category from the preset noise reduction strategy library, and determine it as the optimal noise reduction processing strategy, and then use the The optimal noise reduction processing strategy performs noise reduction processing on the speech noise in the speech sequence to be recognized, so that the optimal noise reduction processing effect can be achieved for the speech noise in different scenarios, and the unified noise reduction processing strategy is avoided. Noise reduction processing effect of speech noise.
  • Another method for processing speech noise provided by the embodiment of the present application is compared with the current manner in which the same noise reduction strategy is used for noise reduction processing for different types of speech noise, the present application can obtain the voice sequence to be recognized; and Perform noise recognition on the voice sequence. If the voice sequence contains voice noise, use a preset noise classification model to determine the noise category corresponding to the voice noise, wherein the noise classification model is generated from multiple noises. The types of speech noise generated by different noise generation models are different from the joint training of the models; at the same time, based on the noise category, the optimal noise reduction processing strategy corresponding to the speech noise is determined, and the optimal noise reduction is used.
  • the noise processing strategy performs noise reduction processing on the speech noise, so that the noise classification model and multiple noise generation models are jointly trained, so that the noise classification model in this application can identify the types of speech noise in different scenarios , And then can select the optimal noise reduction processing strategy to process the speech noise according to the determined noise category, so as to achieve the optimal noise reduction processing effect.
  • an embodiment of the present application provides an apparatus for processing speech noise.
  • the apparatus includes: an acquisition unit 31, a determination unit 32, and a noise reduction unit 33.
  • the acquiring unit 31 may be used to acquire a voice sequence to be recognized.
  • the acquiring unit 31 is the main functional module of the device for acquiring the voice sequence to be recognized.
  • the determining unit 32 may be configured to perform noise recognition on the voice sequence, and if the voice sequence contains voice noise, use a preset noise classification model to determine the noise category corresponding to the voice noise, wherein the The noise classification model is jointly trained with multiple noise generation models, and the types of speech noise generated by different noise generation models are different.
  • the determining unit 32 is a main functional module that performs noise recognition on the voice sequence in the device, and if the voice sequence contains voice noise, it uses a preset noise classification model to determine the main functional module of the noise category corresponding to the voice noise, It is also the core module.
  • the noise reduction unit 33 may be configured to determine an optimal noise reduction processing strategy corresponding to the speech noise based on the noise category, and use the optimal noise reduction processing strategy to perform noise reduction processing on the speech noise.
  • the noise reduction unit 33 determines the optimal noise reduction processing strategy corresponding to the speech noise based on the noise category in the device, and uses the optimal noise reduction processing strategy to perform noise reduction processing on the speech noise Main functional modules.
  • the determination unit 32 includes an extraction module 321, a judgment module 322 and a determination module 323.
  • the extraction module 321 may be used to extract voice features of the voice sequence to obtain voice features corresponding to the voice sequence to be recognized.
  • the judgment module 322 may be used to judge whether the speech sequence contains speech noise based on the speech feature.
  • the determining module 323 may be configured to determine the noise category corresponding to the voice noise by using the noise classification model based on the extracted voice feature if the voice noise is included.
  • the device further includes a clustering unit 34 and a construction unit 35.
  • the acquiring unit 31 may also be used to acquire a real voice sequence and multiple random voice sequences in a preset voice sample library.
  • the clustering unit 34 may be used to perform clustering processing on the real voice sequence to obtain real voice sequences in different clustering categories.
  • the construction unit 35 may be configured to construct the noise classification model and the multiple noise generation models according to the multiple random voice sequences and the real voice sequences in the different clustering categories.
  • the clustering unit 34 includes: a calculation module 341 and a clustering module 342.
  • the calculation module 341 may be used to calculate the Euclidean distance between different real speech sequences according to a preset Euclidean distance algorithm.
  • the clustering module 342 may be used to perform clustering processing on the real speech sequence based on the Euclidean distance to obtain real speech sequences in different clustering categories.
  • the construction unit 35 includes: a first construction module 351 and a second construction module 352.
  • the first construction module 351 may be used to separately construct an initial noise classification model and multiple initial noise generation models.
  • the second construction module 352 may be used to combine the initial noise classification model and the multiple initial noise generation models according to the multiple random voice sequences and the real voice sequences in the different clustering categories Iterative training to construct the noise classification model and the multiple noise generation models.
  • the second construction module 352 includes: a generation sub-module, an identification sub-module, a classification sub-module, and a construction sub-module.
  • the generating sub-module may be used to input the multiple random speech sequences into the multiple initial noise generation models to generate different types of speech noise.
  • the recognition sub-module may be used to input the generated speech noise and the real speech sequence into the initial noise and noise recognition model to perform noise recognition, and obtain the initial noise recognition result.
  • the classification sub-module can be used to extract the speech features corresponding to the speech noise in the initial noise recognition result, and input it into the initial noise classification model for noise classification, to obtain the initial noise classification result.
  • the construction sub-module may be used to construct a noise recognition accuracy loss function and a noise classification accuracy loss function based on the initial noise recognition result and the initial noise classification result.
  • the construction sub-module may also be used to combine the initial noise recognition model, the initial noise classification model, and the multiple initial noise generation models according to the noise recognition accuracy loss function and the noise classification accuracy loss function Iterative training to separately construct a noise recognition model, the noise classification model, and the multiple noise generation models.
  • an embodiment of the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium may be non-volatile or volatile, and stored thereon.
  • There is a computer program when the program is executed by the processor, the following steps are realized: obtain the speech sequence to be recognized; obtain the speech sequence to be recognized; A noise classification model is assumed to determine the noise category corresponding to the speech noise, wherein the noise classification model is jointly trained with multiple noise generation models, and the types of speech noise generated by different noise generation models are different;
  • the noise category determines the optimal noise reduction processing strategy corresponding to the speech noise, and uses the optimal noise reduction processing strategy to perform noise reduction processing on the speech noise.
  • the computer device includes: a processor 41, The memory 42 and a computer program that is stored on the memory 42 and can run on the processor, wherein the memory 42 and the processor 41 are both set on the bus 43, and the processor 41 implements the following steps when the program is executed: The voice sequence; perform noise recognition on the voice sequence, and if the voice sequence contains voice noise, use a preset noise classification model to determine the noise category corresponding to the voice noise, wherein the noise classification model is Multiple noise generation models are jointly trained, and the types of speech noise generated by different noise generation models are different; based on the noise category, the optimal noise reduction processing strategy corresponding to the speech noise is determined, and the optimal noise reduction is used
  • the noise processing strategy performs noise reduction processing on the speech noise.
  • the present application can obtain the voice sequence to be recognized; perform noise recognition on the voice sequence, and if the voice sequence contains voice noise, use a preset noise classification model to determine the voice noise
  • the corresponding noise category wherein the noise classification model is jointly trained with multiple noise generation models, and the types of speech noise generated by different noise generation models are different; at the same time, based on the noise category, the The optimal noise reduction processing strategy corresponding to the speech noise, and the optimal noise reduction processing strategy is used to reduce the noise of the speech noise, so that the noise classification model and multiple noise generation models are jointly trained, so that
  • the noise classification model in this application can identify the types of speech noise in different scenarios, and then can select the optimal noise reduction processing strategy to process the speech noise according to the determined noise category, and can achieve the optimal noise reduction processing effect .

Abstract

A voice noise processing method and apparatus, and a computer device and a storage medium, which relate to the field of artificial intelligence. The method comprises: acquiring a voice sequence to be subjected to recognition (101); performing noise recognition on the voice sequence, and if the voice sequence includes a voice noise, determining, by using a preset noise classification model, a noise category corresponding to the voice noise, wherein the noise classification model is obtained by jointly training same and a plurality of noise generation models, and the categories of voice noises generated by different noise generation models are different (102); and on the basis of the noise category, determining an optimal noise reduction processing strategy corresponding to the voice noise, and performing noise reduction processing on the voice noise by using the optimal noise reduction processing strategy (103). The categories of voice noises in different scenarios can be recognized, and the voice noises are processed by means of an appropriate noise reduction processing manner and according to the recognized noise categories, so as to achieve an optimal noise reduction processing effect.

Description

语音噪声的处理方法、装置、计算机设备及存储介质Speech noise processing method, device, computer equipment and storage medium
本申请要求于2020年10月26日提交中国专利局、申请号为202011153509.1,发明名称为“语音噪声的处理方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on October 26, 2020, the application number is 202011153509.1, and the invention title is "Speech noise processing methods, devices, computer equipment, and storage media". The entire content of the application is approved The reference is incorporated in this application.
技术领域Technical field
本申请涉及人工智能技术领域,尤其是涉及一种语音噪声的处理方法、装置、计算机设备及存储介质。This application relates to the field of artificial intelligence technology, and in particular to a method, device, computer equipment, and storage medium for processing speech noise.
背景技术Background technique
在语音识别技术中,通常需要识别语音序列中的噪声,并对识别的噪声进行降噪处理,以提高后续语音识别的准确率,因此,有效地对语音噪声进行处理十分重要。In speech recognition technology, it is usually necessary to recognize the noise in the speech sequence and perform noise reduction processing on the recognized noise to improve the accuracy of subsequent speech recognition. Therefore, it is very important to effectively process the speech noise.
目前,在对语音噪声处理的过程中,通常先对语音噪声进行识别,在识别出语音噪声后采用统一的降噪处理方式对语音噪声进行处理。然而,发明人意识到这种方式无法对语音噪声的种类进行识别,不同场景下的语音噪声的种类是不同的,如果均采用相同的降噪处理方式对不同场景下的语音噪声进行处理,所能达到的降噪效果有限,即在不同场景下无法达到最优的降噪效果。At present, in the process of processing voice noise, the voice noise is usually recognized first, and after the voice noise is recognized, a unified noise reduction processing method is used to process the voice noise. However, the inventor realized that this method cannot identify the types of speech noise. The types of speech noise in different scenarios are different. If the same noise reduction processing method is used to process the speech noise in different scenarios, The noise reduction effect that can be achieved is limited, that is, the optimal noise reduction effect cannot be achieved in different scenarios.
技术问题technical problem
本申请提供了一种语音噪声的处理方法、装置、计算机设备及存储介质,主要在于能够对不同场景下语音噪声的种类进行识别,并根据识别的噪声种类采用适当的降噪处理方式对语音噪声进行处理,以达到最优降噪处理效果。This application provides a method, device, computer equipment, and storage medium for processing speech noise, mainly in that it can identify the types of speech noise in different scenarios, and adopt an appropriate noise reduction processing method according to the recognized noise type. Perform processing to achieve the optimal noise reduction processing effect.
技术解决方案Technical solutions
根据本申请的第一个方面,提供一种语音噪声的处理方法,包括:According to the first aspect of the present application, a method for processing speech noise is provided, including:
获取待识别的语音序列;Obtain the voice sequence to be recognized;
对所述语音序列进行噪音识别,若所述语音序列中包含语音噪声,则利用预设的噪声分类模型确定所述语音噪声对应的噪声类别,其中,所述噪声分类模型是与多个噪声生成模型联合训练得到的,不同噪声生成模型所生成的语音噪声的种类不同;Perform noise recognition on the voice sequence. If the voice sequence contains voice noise, use a preset noise classification model to determine the noise category corresponding to the voice noise, wherein the noise classification model is generated from multiple noises. The types of speech noise generated by different noise generation models are different when the models are jointly trained;
基于所述噪声类别,确定所述语音噪声对应的最优降噪处理策略,并利用所述最优降噪处理策略对所述语音噪声进行降噪处理。Based on the noise category, an optimal noise reduction processing strategy corresponding to the speech noise is determined, and the optimal noise reduction processing strategy is used to perform noise reduction processing on the speech noise.
根据本申请的第二个方面,提供一种语音噪声的处理装置,包括:According to a second aspect of the present application, there is provided a speech noise processing device, including:
获取单元,用于获取待识别的语音序列;The acquiring unit is used to acquire the voice sequence to be recognized;
确定单元,用于对所述语音序列进行噪声识别,若所述语音序列中包含语音噪声,则利用预设的噪声分类模型确定所述语音噪声对应的噪声类别,其中,所述噪声分类模型是与多个噪声生成模型联合训练得到的,不同噪声生成模型所生成的语音噪声的种类不同;The determining unit is configured to perform noise recognition on the voice sequence, and if the voice sequence contains voice noise, use a preset noise classification model to determine the noise category corresponding to the voice noise, wherein the noise classification model is It is obtained by joint training with multiple noise generation models, and the types of speech noise generated by different noise generation models are different;
降噪单元,用于基于所述噪声类别,确定所述语音噪声对应的最优降噪处理策略,并利用所述最优降噪处理策略对所述语音噪声进行降噪处理。The noise reduction unit is configured to determine an optimal noise reduction processing strategy corresponding to the speech noise based on the noise category, and use the optimal noise reduction processing strategy to perform noise reduction processing on the speech noise.
根据本申请的第三个方面,提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现一种语音噪声的处理方法的步骤:According to a third aspect of the present application, there is provided a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the steps of a method for processing speech noise are realized:
获取待识别的语音序列;Obtain the voice sequence to be recognized;
对所述语音序列进行噪音识别,若所述语音序列中包含语音噪声,则利用预设的噪声分类模型确定所述语音噪声对应的噪声类别,其中,所述噪声分类模型是与多个噪声生成模型联合训练得到的,不同噪声生成模型所生成的语音噪声的种类不同;Perform noise recognition on the voice sequence. If the voice sequence contains voice noise, use a preset noise classification model to determine the noise category corresponding to the voice noise, wherein the noise classification model is generated from multiple noises. The types of speech noise generated by different noise generation models are different when the models are jointly trained;
基于所述噪声类别,确定所述语音噪声对应的最优降噪处理策略,并利用所述最优降噪处理策略对所述语音噪声进行降噪处理。Based on the noise category, an optimal noise reduction processing strategy corresponding to the speech noise is determined, and the optimal noise reduction processing strategy is used to perform noise reduction processing on the speech noise.
根据本申请的第四个方面,提供一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现一种语音噪声的处理方法的步骤:According to a fourth aspect of the present application, there is provided a computer device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor. The processor implements a voice noise when the program is executed. The steps of the processing method:
获取待识别的语音序列;Obtain the voice sequence to be recognized;
对所述语音序列进行噪音识别,若所述语音序列中包含语音噪声,则利用预设的噪声分类模型确定所述语音噪声对应的噪声类别,其中,所述噪声分类模型是与多个噪声生成模型联合训练得到的,不同噪声生成模型所生成的语音噪声的种类不同;Perform noise recognition on the voice sequence. If the voice sequence contains voice noise, use a preset noise classification model to determine the noise category corresponding to the voice noise, wherein the noise classification model is generated from multiple noises. The types of speech noise generated by different noise generation models are different when the models are jointly trained;
基于所述噪声类别,确定所述语音噪声对应的最优降噪处理策略,并利用所述最优降噪处理策略对所述语音噪声进行降噪处理。Based on the noise category, an optimal noise reduction processing strategy corresponding to the speech noise is determined, and the optimal noise reduction processing strategy is used to perform noise reduction processing on the speech noise.
有益效果Beneficial effect
本申请提供的一种语音噪声的处理方法、装置、计算机设备及存储介质,与目前针对不同种类的语音噪声均采用同种降噪策略进行降噪处理的方式相比,本申请能够获取待识别的语音序列;并对所述语音序列进行噪音识别,若所述语音序列中包含语音噪声,则利用预设的噪声分类模型确定所述语音噪声对应的噪声类别,其中,所述噪声分类模型是与多个噪声生成模型联合训练得到的,不同噪声生成模型所生成的语音噪声的种类不同;与此同时,基于所述噪声类别,确定所述语音噪声对应的最优降噪处理策略,并利用所述最优降噪处理策略对所述语音噪声进行降噪处理,由此通过将噪声分类模型与多个噪声生成模型联合进行训练,从而使得本申请中的噪声分类模型能够对不同场景下语音噪声的种类进行识别,进而能够根据确定的噪声类别,选择最优的降噪处理策略对语音噪声进行处理,能够达到最优的降噪处理效果。The speech noise processing method, device, computer equipment, and storage medium provided in this application are compared with the current method of using the same noise reduction strategy for noise reduction processing for different types of speech noise. This application can obtain the to-be-identified The voice sequence; and noise recognition is performed on the voice sequence. If the voice sequence contains voice noise, a preset noise classification model is used to determine the noise category corresponding to the voice noise, wherein the noise classification model is It is obtained by joint training with multiple noise generation models, and the types of speech noise generated by different noise generation models are different; at the same time, based on the noise category, determine the optimal noise reduction processing strategy corresponding to the speech noise, and use The optimal noise reduction processing strategy performs noise reduction processing on the speech noise, so that the noise classification model and multiple noise generation models are jointly trained, so that the noise classification model in this application can be used for speech in different scenarios. The type of noise is identified, and then the optimal noise reduction processing strategy can be selected to process the speech noise according to the determined noise category, and the optimal noise reduction processing effect can be achieved.
附图说明Description of the drawings
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:The drawings described here are used to provide a further understanding of the application and constitute a part of the application. The exemplary embodiments and descriptions of the application are used to explain the application, and do not constitute an improper limitation of the application. In the attached picture:
图1示出了本申请实施例提供的一种语音噪声的处理方法流程图;Fig. 1 shows a flowchart of a method for processing speech noise provided by an embodiment of the present application;
图2示出了本申请实施例提供的另一种语音噪声的处理方法流程图;FIG. 2 shows a flowchart of another method for processing voice noise according to an embodiment of the present application;
图3示出了本申请实施例提供的一种语音噪声的处理装置的结构示意图;FIG. 3 shows a schematic structural diagram of a speech noise processing apparatus provided by an embodiment of the present application;
图4示出了本申请实施例提供的另一种语音噪声的处理装置的结构示意图;FIG. 4 shows a schematic structural diagram of another apparatus for processing speech noise according to an embodiment of the present application;
图5示出了本申请实施例提供的一种计算机设备的实体结构示意图。Fig. 5 shows a schematic diagram of the physical structure of a computer device provided by an embodiment of the present application.
本发明的最佳实施方式The best mode of the present invention
下文中将参考附图并结合实施例来详细说明本申请。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。Hereinafter, the present application will be described in detail with reference to the drawings and in conjunction with the embodiments. It should be noted that the embodiments in the application and the features in the embodiments can be combined with each other if there is no conflict.
目前,在对语音噪声处理的过程中,通常先对语音噪声进行识别,在识别出语音噪声后采用统一的降噪处理方式对语音噪声进行处理。然而,这种方式无法对语音噪声的种类进行识别,不同场景下的语音噪声的种类是不同的,如果均采用相同的降噪处理方式对不同场景下的语音噪声进行处理,所能达到的降噪效果有限,即在不同场景下无法达到最优的降噪效果。At present, in the process of processing voice noise, the voice noise is usually recognized first, and after the voice noise is recognized, a unified noise reduction processing method is used to process the voice noise. However, this method cannot identify the types of speech noise. The types of speech noise in different scenarios are different. If the same noise reduction processing method is used to process the speech noise in different scenarios, the reduction that can be achieved can be achieved. The noise effect is limited, that is, the optimal noise reduction effect cannot be achieved in different scenes.
为了解决上述问题,本申请实施例提供了一种语音噪声的处理方法,如图1所示,所述方法包括:In order to solve the foregoing problem, an embodiment of the present application provides a method for processing speech noise. As shown in FIG. 1, the method includes:
101、获取待识别的语音序列。101. Acquire a voice sequence to be recognized.
其中,待识别的语音序列为从某场景下获取的用户语音序列,例如,待识别的语音序列为在街道旁采集的一段用户语音序列,或者从工厂中采集的一段用户语音序列,该待识别的语音序列中可能会包含语音噪声,也可能不包含语音噪声,对于本申请实施例,为了提高用户的语音识别精度,需要判断采集的用户语音序列中是否包含语音噪声,如果包含语音噪声,则需要对用户的语音序列进行降噪处理,以便提高用户的语音识别精度,具体进行降噪处理时,可以根据语音噪声的种类选择合适的降噪处理策略对语音噪声进行处理,以达到最优的降噪效果,本申请实施例主要适用于语音噪声的处理,本申请实施例的执行主体为能够对语音噪声进行处理的装置或者设备,可以设置在客户端或者服务器一侧。Among them, the voice sequence to be recognized is a user voice sequence obtained from a certain scene. For example, the voice sequence to be recognized is a user voice sequence collected on the side of a street, or a user voice sequence collected from a factory. The voice sequence may or may not contain voice noise. For this embodiment of the application, in order to improve the accuracy of the user’s voice recognition, it is necessary to determine whether the collected user’s voice sequence contains voice noise. If it contains voice noise, then The user’s speech sequence needs to be denoised in order to improve the accuracy of the user’s speech recognition. In the specific noise reduction process, an appropriate noise reduction strategy can be selected according to the type of speech noise to process the speech noise in order to achieve the optimal For noise reduction effects, the embodiments of the present application are mainly applicable to the processing of speech noise. The execution subject of the embodiments of the present application is a device or device capable of processing speech noise, which can be set on the client or server side.
具体地,获取用户在某场景下的一段语音序列,在判断该语音序列中是否包含语音噪声之前,需要对获取的用户语音序列进行预处理,具体包括预加重处理、分帧处理和加窗函数处理,由此得到预处理后的语音序列,并将预处理后的语音序列作为待识别的语音序列,以便判断待识别的语音序列中是否包含语音噪声,如果待识别的语音序列中不包含语音噪声, 则直接对待识别的语音序列进行语音识别;如果待识别的语音序列中包含语音噪声,则需要进一步确定所包含需要噪声的种类,以便根据确定的语音噪声的种类,选择合适的降噪处理策略进行降噪处理,从而达到最优的降噪效果。Specifically, to obtain a user's speech sequence in a certain scene, before judging whether the speech sequence contains speech noise, it is necessary to pre-process the obtained user's speech sequence, including pre-emphasis processing, framing processing, and windowing function Processing, the preprocessed speech sequence is obtained, and the preprocessed speech sequence is used as the speech sequence to be recognized, so as to determine whether the speech sequence to be recognized contains speech noise, if the speech sequence to be recognized does not contain speech If there is noise, the speech sequence to be recognized is directly recognized; if the speech sequence to be recognized contains speech noise, it is necessary to further determine the type of noise required, so as to select the appropriate noise reduction process according to the determined type of speech noise Strategies for noise reduction processing, so as to achieve the best noise reduction effect.
102、对所述语音序列进行噪音识别,若所述语音序列中包含语音噪声,则利用预设的噪声分类模型确定所述语音噪声对应的噪声类别。102. Perform noise recognition on the voice sequence, and if the voice sequence contains voice noise, use a preset noise classification model to determine a noise category corresponding to the voice noise.
其中,所述噪声分类模型是与多个噪声生成模型联合训练得到的,不同噪声生成模型所生成的语音噪声的种类不同,此外,不同场景下语音噪声的种类不同,例如,在街道旁采集的语音噪声的种类与工厂中采集的语音噪声的种类不同,对于本申请实施例,为了判断待识别的语音序列中是否包含语音噪声,将待识别的语音序列输入至预设噪声识别模型进行噪声识别,该预设噪声识别模型具体可以为第一预设神经网络模型,在利用第一预设神经网络模型识别语音噪声的过程中,第一预设神经网络模型中的隐藏层会提取待识别的语音序列对应的语音特征,进而根据提取的语音特征判断待识别的语音序列中是否包含语音噪声,如果待识别的语音序列中不包含语音噪声,则直接对提取的语音特征进行语音识别;如果待识别的语音序列中包含语音噪声,则将提取的语音特征输入至预设的噪声分类模型进行噪声分类,所述噪声分类模型具体可以为第二预设神经网络模型,具体进行噪声分类时,利用第二预设神经网络模型中的隐藏层提取语音噪声对应的噪声特征,进而根据提取的噪声特征确定待识别的语音序列中所包含的语音噪声对应的噪声种类,以便根据确定的噪声种类,选择合适的降噪处理策略对待识别的语音序列进行降噪处理,以达到在场景下最优的降噪效果。Wherein, the noise classification model is obtained through joint training with multiple noise generation models. The types of speech noise generated by different noise generation models are different. In addition, the types of speech noise in different scenarios are different, for example, collected on the side of the street. The type of speech noise is different from the type of speech noise collected in the factory. For the embodiment of this application, in order to determine whether the speech sequence to be recognized contains speech noise, the speech sequence to be recognized is input into a preset noise recognition model for noise recognition The preset noise recognition model may specifically be a first preset neural network model. In the process of using the first preset neural network model to recognize speech noise, the hidden layer in the first preset neural network model extracts the to-be-recognized According to the voice features corresponding to the voice sequence, it is determined whether the voice sequence to be recognized contains voice noise according to the extracted voice features. If the voice sequence to be recognized does not contain voice noise, then the extracted voice feature is directly subjected to voice recognition; When the recognized speech sequence contains speech noise, the extracted speech features are input into a preset noise classification model for noise classification. The noise classification model may be a second preset neural network model. When performing noise classification, use The hidden layer in the second preset neural network model extracts the noise features corresponding to the voice noise, and then determines the noise type corresponding to the voice noise contained in the speech sequence to be recognized according to the extracted noise feature, so as to select according to the determined noise type A suitable noise reduction processing strategy performs noise reduction processing on the speech sequence to be recognized to achieve the optimal noise reduction effect in the scene.
基于所述噪声类别,确定所述语音噪声对应的最优降噪处理策略,并利用所述最优降噪处理策略对所述语音噪声进行降噪处理。Based on the noise category, an optimal noise reduction processing strategy corresponding to the speech noise is determined, and the optimal noise reduction processing strategy is used to perform noise reduction processing on the speech noise.
其中,不同种类的语音噪声所适用的最优降噪处理策略不同,例如,针对来自街道旁的语音噪声,由于街道旁的噪声随机性比较大,且噪声的频谱范围较宽,因此可以采用自适应滤波器进行降噪;针对来自工厂中的语音噪声,由于工厂中的语音噪声大多是车间的机器加工噪声,噪声的随机性较小,而且噪声的频谱范围较窄,因此可以采用自适应陷波器进行降噪处理,对于本方实施例,根据确定的语音噪声对应的噪声类别,从预设降噪策略库中选择该噪声类别对应的降噪处理策略,并将其确定为最优降噪处理策略,之后利用该最优降噪处理策略对待识别的语音序列中的语音噪声进行降噪处理,从而针对不同场景下的语音噪声,均能够达到最优降噪处理效果,避免采用统一的降噪处理策略,影像语音噪声的降噪处理效果。Among them, different types of speech noise are suitable for different optimal noise reduction processing strategies. For example, for speech noise from the side of the street, because the noise on the side of the street is relatively random and the noise has a wide spectrum range, it can be used. Adaptive filter for noise reduction; for the speech noise from the factory, since most of the speech noise in the factory is machine processing noise in the workshop, the randomness of the noise is small, and the noise spectrum range is narrow, so adaptive trapping can be used. The wave generator performs noise reduction processing. For this embodiment, according to the noise category corresponding to the determined speech noise, select the noise reduction processing strategy corresponding to the noise category from the preset noise reduction strategy library, and determine it as the optimal reduction Noise processing strategy, and then use the optimal noise reduction processing strategy to reduce the noise in the speech noise in the speech sequence to be recognized, so that the optimal noise reduction processing effect can be achieved for the speech noise in different scenarios, avoiding the use of uniform Noise reduction processing strategy, noise reduction processing effect of image speech noise.
本申请实施例提供的一种语音噪声的处理方法,与目前针对不同种类的语音噪声均采用同种降噪策略进行降噪处理的方式相比,本申请能够获取待识别的语音序列;并对所述语音序列进行噪音识别,若所述语音序列中包含语音噪声,则利用预设的噪声分类模型确定所述语音噪声对应的噪声类别,其中,所述噪声分类模型是与多个噪声生成模型联合训练得到的,不同噪声生成模型所生成的语音噪声的种类不同;与此同时,基于所述噪声类别,确定所述语音噪声对应的最优降噪处理策略,并利用所述最优降噪处理策略对所述语音噪声进行降噪处理,由此通过将噪声分类模型与多个噪声生成模型联合进行训练,从而使得本申请中的噪声分类模型能够对不同场景下语音噪声的种类进行识别,进而能够根据确定的噪声类别,选择最优的降噪处理策略对语音噪声进行处理,能够达到最优的降噪处理效果。The method for processing speech noise provided by the embodiment of the present application is compared with the current manner in which the same noise reduction strategy is used for noise reduction processing for different types of speech noise, the present application can obtain the voice sequence to be recognized; and Noise recognition is performed on the speech sequence, and if the speech sequence contains speech noise, a preset noise classification model is used to determine the noise category corresponding to the speech noise, wherein the noise classification model is related to multiple noise generation models The types of speech noise generated by different noise generation models are different from the joint training; at the same time, based on the noise category, the optimal noise reduction processing strategy corresponding to the speech noise is determined, and the optimal noise reduction is used The processing strategy performs noise reduction processing on the speech noise, so that the noise classification model and multiple noise generation models are jointly trained, so that the noise classification model in this application can recognize the types of speech noise in different scenarios. Furthermore, according to the determined noise category, the optimal noise reduction processing strategy can be selected to process the speech noise, and the optimal noise reduction processing effect can be achieved.
进一步的,为了更好的说明上述语音噪声的处理过程,作为对上述实施例的细化和扩展,本申请实施例提供了另一种语音噪声的处理方法,如图2所示,所述方法包括:Further, in order to better explain the above-mentioned speech noise processing process, as a refinement and extension of the above-mentioned embodiment, an embodiment of the present application provides another method for processing speech noise. As shown in FIG. 2, the method include:
201、获取预设语音样本库中的真实语音序列以及多个随机语音序列,并对所述真实语音序列进行聚类处理,得到不同聚类类别下的真实语音序列。201. Obtain a real voice sequence and a plurality of random voice sequences in a preset voice sample library, and perform clustering processing on the real voice sequence to obtain real voice sequences in different clustering categories.
其中,多个随机语音序列可以服从高斯分布,真实语音序列为在不同场景中采集的用户的真实语音序列,该真实语音序列经过降噪处理,不存在噪声,可以直接进行语音识别,在本申请实施例中,希望利用多个随机语音序列和多个噪声生成模型,模拟用户在不同场景下的真实语音序列,由此生成不同场景下的语音噪声,进而依据生成的不同场景下的语音噪声和不同场景下的真实语音序列,分别构建噪声识别模型和噪声分类模型,以达到能够对语音噪声进行识别和分类的目的。Among them, multiple random speech sequences can obey Gaussian distribution. The real speech sequence is the real speech sequence of the user collected in different scenes. The real speech sequence is processed by noise reduction, and there is no noise, and the speech recognition can be directly performed. In the embodiment, it is desired to use multiple random voice sequences and multiple noise generation models to simulate the real voice sequence of the user in different scenarios, thereby generating voice noise in different scenarios, and then according to the generated voice noise and the voice noise in different scenarios. For real speech sequences in different scenarios, a noise recognition model and a noise classification model are constructed respectively to achieve the purpose of recognizing and classifying speech noise.
对于本申请实施例,获取预设样本库中用户的真实语音序列,该真实语音序列来自于不同场景,为了利用不同场景下的真实语音序列和随机语音序列,构建噪声识别模型和噪声分类模型,需要先将预设样本库中的真实语音序列进行聚类处理,基于此,步骤201具体包括: 根据预设的欧式距离算法计算不同真实语音序列之间的欧式距离;基于所述欧式距离,对所述真实语音序列进行聚类处理,得到不同聚类类别下的真实语音序列。由于不同场景下的语音序列较为相似,将预设样本库中的语音序列进行聚类处理,得到不同聚类类别下的真实语音序列,并确定不同聚类类别下的真实语音序列所对应的场景,进而能够确定不同场景下的真实语音序列。For the embodiment of this application, the real voice sequence of the user in the preset sample library is obtained. The real voice sequence comes from different scenarios. In order to use the real voice sequence and random voice sequence in different scenarios to construct a noise recognition model and a noise classification model, The real speech sequences in the preset sample library need to be clustered first. Based on this, step 201 specifically includes: calculating the Euclidean distance between different real speech sequences according to the preset Euclidean distance algorithm; based on the Euclidean distance, The real speech sequence is clustered to obtain real speech sequences in different clustering categories. Because the voice sequences in different scenarios are relatively similar, the voice sequences in the preset sample library are clustered to obtain the real voice sequences under different clustering categories, and the scenes corresponding to the real voice sequences under different clustering categories are determined , And then be able to determine the real voice sequence in different scenarios.
具体地,根据预设的欧式距离算法分别计算不同真实语音序列之间的欧式距离,根据计算的欧式距离对真实语音序列进行聚类处理,得到不同聚类类别下的真实语音序列,进而通过提取不同聚类类别下的真实语音序列对应的语音特征,确定不同聚类类别下的真实语音序列对应的场景,例如,确定真实语音序列1-10为在街道旁采集的语音序列,语音序列11-20为在工厂中采集的语音序列,由此能够确定不同场景下的真实语音序列。Specifically, the Euclidean distance between different real speech sequences is calculated according to the preset Euclidean distance algorithm, and the real speech sequence is clustered according to the calculated Euclidean distance to obtain real speech sequences in different clustering categories, and then extract The voice features corresponding to the real voice sequences under different clustering categories are determined, and the scenes corresponding to the real voice sequences under different clustering categories are determined. For example, it is determined that the real voice sequences 1-10 are the voice sequences collected on the street, and the voice sequences 11- 20 is the voice sequence collected in the factory, which can determine the real voice sequence in different scenarios.
202、根据所述多个随机语音序列和所述不同聚类类别下的真实语音序列,构建所述噪声分类模型和所述多个噪声生成模型。202. Construct the noise classification model and the multiple noise generation models according to the multiple random voice sequences and the real voice sequences in the different clustering categories.
对于本申请实施例,为了构建噪声分类模型和多个噪声生成模型,步骤202具体包括:分别构建初始噪声分类模型和多个初始噪声生成模型;根据所述多个随机语音序列和所述不同聚类类别下的真实语音序列,对所述初始噪声分类模型和所述多个初始噪声生成模型进行联合迭代训练,构建所述噪声分类模型和所述多个噪声生成模型。进一步地,为了能够对语音噪声进行识别,还需要构建噪声识别模型,所述分别构建初始噪声分类模型和多个初始噪声生成模型,包括:分别构建初始噪声识别模型,初始噪声分类模型和多个初始噪声生成模型。For the embodiment of the present application, in order to construct a noise classification model and multiple noise generation models, step 202 specifically includes: constructing an initial noise classification model and multiple initial noise generation models respectively; For real speech sequences in a class category, joint iterative training is performed on the initial noise classification model and the multiple initial noise generation models to construct the noise classification model and the multiple noise generation models. Further, in order to be able to recognize speech noise, it is also necessary to construct a noise recognition model, which separately constructs an initial noise classification model and multiple initial noise generation models, including: separately constructing an initial noise recognition model, an initial noise classification model, and multiple initial noise generation models. The initial noise generation model.
基于此,所述根据所述多个随机语音序列和所述不同聚类类别下的真实语音序列,对所述初始噪声分类模型和所述多个初始噪声生成模型进行联合迭代训练,构建所述噪声分类模型和所述多个噪声生成模型,包括:将所述多个随机语音序列分别输入至所述多个初始噪声生成模型,生成不同种类的语音噪声;将生成的语音噪声和所述真实语音序列分别输入至所述初始噪声噪声识别模型进行噪声识别,得到初始噪声识别结果;提取初始噪声识别结果中语音噪声对应的语音特征,并将其输入至所述初始噪声分类模型进行噪声分类,得到初始噪声分类结果;基于所述初始噪声识别结果和所述初始噪声分类结果,分别构建噪声识别准确度损失函数和噪声分类准确度损失函数;根据噪声识别准确度损失函数和噪声分类准确度损失函数,对所述初始噪声识别模型、所述初始噪声分类模型和所述多个初始噪声生成模型进行联合迭代训练,分别构建噪声识别模型、所述噪声分类模型和所述多个噪声生成模型。其中,预设噪声生成模型采用卷积神经网络。Based on this, the initial noise classification model and the multiple initial noise generation models are jointly iteratively trained according to the multiple random voice sequences and the real voice sequences in the different clustering categories to construct the The noise classification model and the multiple noise generation models include: respectively inputting the multiple random speech sequences into the multiple initial noise generation models to generate different types of speech noise; and combining the generated speech noise and the real noise The speech sequences are respectively input to the initial noise and noise recognition model for noise recognition, and the initial noise recognition result is obtained; the speech feature corresponding to the speech noise in the initial noise recognition result is extracted, and it is input to the initial noise classification model for noise classification, Obtain the initial noise classification result; based on the initial noise recognition result and the initial noise classification result, construct the noise recognition accuracy loss function and noise classification accuracy loss function respectively; according to the noise recognition accuracy loss function and noise classification accuracy loss Function to perform joint iterative training on the initial noise recognition model, the initial noise classification model, and the multiple initial noise generation models to construct the noise recognition model, the noise classification model, and the multiple noise generation models respectively. Among them, the preset noise generation model uses a convolutional neural network.
具体地,通过将不同种类的语音噪声和不同聚类类别下的真实语音序列分别输入至初始噪声识别模型进行噪声识别,得到初始噪声识别结果,之后提取初始识别结果中语音噪声对应的语音特征,将其输入至预设初始噪声分类模型进行噪声分类,得到噪声分类结果,并根据噪声分类结果和噪声识别结果,分别构建噪声识别准确度损失函数和噪声分类准确度损失函数,具体公式如下:Specifically, by inputting different types of speech noises and real speech sequences under different clustering categories into the initial noise recognition model for noise recognition, the initial noise recognition results are obtained, and then the speech features corresponding to the speech noise in the initial recognition results are extracted. Input it into the preset initial noise classification model for noise classification, and obtain the noise classification result. According to the noise classification result and the noise recognition result, the noise recognition accuracy loss function and the noise classification accuracy loss function are constructed respectively. The specific formula is as follows:
Figure PCTCN2020136367-appb-000001
Figure PCTCN2020136367-appb-000001
Figure PCTCN2020136367-appb-000002
Figure PCTCN2020136367-appb-000002
其中,Lc为噪声识别准确度损失函数,Lc为噪声分类准确度损失韩式,zi为语音噪声,xi为真实语音序列,D代表预设噪声识别模型,G代表预设噪声生成模型,c代表噪声分类模型,为了保证噪声生成模型所生成的语音噪声与真实语音序列更接近,增加噪声识别模型的识别难度,噪声生成模型与噪声识别模型的优化方向是相反的,即噪声生成模型需要最小化预设噪声识别模型的准确率,因此其优化方向是最小化Lc-Ls,而噪声分类模型的训练目的是最大化分类噪声的准确率,因此其优化方向是最大化Lc+Ls,由此通过上述两个优化方程,能够不断对初始噪声生成模型、初始噪声识别模型和初始噪声分类模型进行联合训练,构建噪声生成模型,噪声识别模型和噪声分类模型。Among them, Lc is the noise recognition accuracy loss function, Lc is the noise classification accuracy loss Korean style, zi is speech noise, xi is the real speech sequence, D stands for the preset noise recognition model, G stands for the preset noise generation model, and c stands for Noise classification model, in order to ensure that the speech noise generated by the noise generation model is closer to the real speech sequence, and increase the difficulty of the recognition of the noise recognition model. The optimization direction of the noise generation model and the noise recognition model is opposite, that is, the noise generation model needs to be minimized The accuracy of the noise recognition model is preset, so its optimization direction is to minimize Lc-Ls, and the training purpose of the noise classification model is to maximize the accuracy of the classification noise, so its optimization direction is to maximize Lc+Ls, so the optimization direction is to maximize Lc+Ls. The above two optimization equations can continuously train the initial noise generation model, the initial noise recognition model and the initial noise classification model to construct the noise generation model, the noise recognition model and the noise classification model.
203、获取待识别的语音序列。203. Acquire a voice sequence to be recognized.
其中,待识别的语音序列为从某场景下获取的用户语音序列,该语音序列中可能包含语音噪声,也可能不包含语音噪声,为了确保后续的语音识别结果,如果待识别的语音序列中包含语音噪声,需要对语音噪声进行降噪进行降噪处理,在对语音噪声进行降噪处理时,为提高语音噪声的降噪处理效果,可以进一步对语音噪声的种类进行识别,以便根据语音噪声的种类选择合适的降噪处理策略对其进行降噪处理。Among them, the voice sequence to be recognized is a user voice sequence obtained from a certain scene. The voice sequence may or may not contain voice noise. In order to ensure the subsequent voice recognition results, if the voice sequence to be recognized contains For speech noise, it is necessary to reduce the noise of speech noise. When noise reduction is performed on speech noise, in order to improve the effect of noise reduction of speech noise, the types of speech noise can be further identified so as to be based on the type of speech noise. Select the appropriate noise reduction processing strategy for the type of noise reduction.
对所述语音序列进行噪音识别,若所述语音序列中包含语音噪声,则利用预设的噪声分类模型确定所述语音噪声对应的噪声类别。Perform noise recognition on the voice sequence, and if the voice sequence contains voice noise, a preset noise classification model is used to determine the noise category corresponding to the voice noise.
其中,所述噪声分类模型是与多个噪声生成模型联合训练得到的,不同噪声生成模型所生成的语音噪声的种类不同。对于本申请实施例,为了确定语音噪声对应的噪声种类,步骤204具体包括:对所述语音序列进行语音特征提取,得到所述语音序列对应的语音特征;基于所述语音特征,判断所述语音序列中是否包含语音噪声;若包含语音噪声,则基于提取的语音特征,利用所述噪声分类模型确定所述语音噪声对应的噪声类别。Wherein, the noise classification model is obtained through joint training with multiple noise generation models, and the types of speech noise generated by different noise generation models are different. For the embodiment of the present application, in order to determine the noise type corresponding to the speech noise, step 204 specifically includes: performing speech feature extraction on the speech sequence to obtain the speech feature corresponding to the speech sequence; and judging the voice based on the speech feature Whether the sequence contains speech noise; if it contains speech noise, based on the extracted speech features, the noise classification model is used to determine the noise category corresponding to the speech noise.
具体地,将待识别的语音序列输入至噪声识别模型进行噪声识别,在噪声识别的过程中,预设噪声识别模型中的隐藏层会提取待识别的语音序列对应的语音特征,基于提取的语音特征判定待识别的语音序列中是否包含语音噪声,若包含语音噪声,则将提取的语音特征输入至所述噪声分类模型进行噪声分类,以确定语音噪声对应的噪声类别。Specifically, the voice sequence to be recognized is input to the noise recognition model for noise recognition. During the process of noise recognition, the hidden layer in the preset noise recognition model will extract the voice features corresponding to the voice sequence to be recognized, based on the extracted voice The feature determines whether the speech sequence to be recognized contains speech noise, and if it contains speech noise, the extracted speech feature is input into the noise classification model for noise classification to determine the noise category corresponding to the speech noise.
205、基于所述噪声类别,确定所述语音噪声对应的最优降噪处理策略,并利用所述最优降噪处理策略对所述语音噪声进行降噪处理。205. Determine an optimal noise reduction processing strategy corresponding to the speech noise based on the noise category, and perform noise reduction processing on the speech noise by using the optimal noise reduction processing strategy.
对于本方实施例,根据确定的语音噪声对应的噪声类别,从预设降噪策略库中选择该噪声类别对应的降噪处理策略,并将其确定为最优降噪处理策略,之后利用该最优降噪处理策略对待识别的语音序列中的语音噪声进行降噪处理,从而能够针对不同场景下的语音噪声,均能够达到最优降噪处理效果,避免采用统一的降噪处理策略,影像语音噪声的降噪处理效果。For this embodiment, according to the noise category corresponding to the determined speech noise, select the noise reduction processing strategy corresponding to the noise category from the preset noise reduction strategy library, and determine it as the optimal noise reduction processing strategy, and then use the The optimal noise reduction processing strategy performs noise reduction processing on the speech noise in the speech sequence to be recognized, so that the optimal noise reduction processing effect can be achieved for the speech noise in different scenarios, and the unified noise reduction processing strategy is avoided. Noise reduction processing effect of speech noise.
本申请实施例提供的另一种语音噪声的处理方法,与目前针对不同种类的语音噪声均采用同种降噪策略进行降噪处理的方式相比,本申请能够获取待识别的语音序列;并对所述语音序列进行噪音识别,若所述语音序列中包含语音噪声,则利用预设的噪声分类模型确定所述语音噪声对应的噪声类别,其中,所述噪声分类模型是与多个噪声生成模型联合训练得到的,不同噪声生成模型所生成的语音噪声的种类不同;与此同时,基于所述噪声类别,确定所述语音噪声对应的最优降噪处理策略,并利用所述最优降噪处理策略对所述语音噪声进行降噪处理,由此通过将噪声分类模型与多个噪声生成模型联合进行训练,从而使得本申请中的噪声分类模型能够对不同场景下语音噪声的种类进行识别,进而能够根据确定的噪声类别,选择最优的降噪处理策略对语音噪声进行处理,能够达到最优的降噪处理效果。Another method for processing speech noise provided by the embodiment of the present application is compared with the current manner in which the same noise reduction strategy is used for noise reduction processing for different types of speech noise, the present application can obtain the voice sequence to be recognized; and Perform noise recognition on the voice sequence. If the voice sequence contains voice noise, use a preset noise classification model to determine the noise category corresponding to the voice noise, wherein the noise classification model is generated from multiple noises. The types of speech noise generated by different noise generation models are different from the joint training of the models; at the same time, based on the noise category, the optimal noise reduction processing strategy corresponding to the speech noise is determined, and the optimal noise reduction is used. The noise processing strategy performs noise reduction processing on the speech noise, so that the noise classification model and multiple noise generation models are jointly trained, so that the noise classification model in this application can identify the types of speech noise in different scenarios , And then can select the optimal noise reduction processing strategy to process the speech noise according to the determined noise category, so as to achieve the optimal noise reduction processing effect.
进一步地,作为图1的具体实现,本申请实施例提供了一种语音噪声的处理装置,如图3所示,所述装置包括:获取单元31、确定单元32和降噪单元33。Further, as a specific implementation of FIG. 1, an embodiment of the present application provides an apparatus for processing speech noise. As shown in FIG. 3, the apparatus includes: an acquisition unit 31, a determination unit 32, and a noise reduction unit 33.
所述获取单元31,可以用于获取待识别的语音序列。所述获取单元31是本装置中获取待识别的语音序列的主要功能模块。The acquiring unit 31 may be used to acquire a voice sequence to be recognized. The acquiring unit 31 is the main functional module of the device for acquiring the voice sequence to be recognized.
所述确定单元32,可以用于对所述语音序列进行噪音识别,若所述语音序列中包含语音噪声,则利用预设的噪声分类模型确定所述语音噪声对应的噪声类别,其中,所述噪声分类模型是与多个噪声生成模型联合训练得到的,不同噪声生成模型所生成的语音噪声的种类不同。所述确定单元32是本装置中对所述语音序列进行噪音识别,若所述语音序列中包含语音噪声,则利用预设的噪声分类模型确定所述语音噪声对应的噪声类别的主要功能模块,也是核心模块。The determining unit 32 may be configured to perform noise recognition on the voice sequence, and if the voice sequence contains voice noise, use a preset noise classification model to determine the noise category corresponding to the voice noise, wherein the The noise classification model is jointly trained with multiple noise generation models, and the types of speech noise generated by different noise generation models are different. The determining unit 32 is a main functional module that performs noise recognition on the voice sequence in the device, and if the voice sequence contains voice noise, it uses a preset noise classification model to determine the main functional module of the noise category corresponding to the voice noise, It is also the core module.
所述降噪单元33,可以用于基于所述噪声类别,确定所述语音噪声对应的最优降噪处理策略,并利用所述最优降噪处理策略对所述语音噪声进行降噪处理。所述降噪单元33是本装置中基于所述噪声类别,确定所述语音噪声对应的最优降噪处理策略,并利用所述最优降噪处理策略对所述语音噪声进行降噪处理的主要功能模块。The noise reduction unit 33 may be configured to determine an optimal noise reduction processing strategy corresponding to the speech noise based on the noise category, and use the optimal noise reduction processing strategy to perform noise reduction processing on the speech noise. The noise reduction unit 33 determines the optimal noise reduction processing strategy corresponding to the speech noise based on the noise category in the device, and uses the optimal noise reduction processing strategy to perform noise reduction processing on the speech noise Main functional modules.
进一步地,为了确定所述语音噪声对应的噪声类别,如图4所示,所述确定单元32,包括提取模块321、判断模块322和确定模块323。Further, in order to determine the noise category corresponding to the speech noise, as shown in FIG. 4, the determination unit 32 includes an extraction module 321, a judgment module 322 and a determination module 323.
所述提取模块321,可以用于对所述语音序列进行语音特征提取,得到所述待识别语音序列对应的语音特征。The extraction module 321 may be used to extract voice features of the voice sequence to obtain voice features corresponding to the voice sequence to be recognized.
所述判断模块322,可以用于基于所述语音特征,判断所述语音序列中是否包含语音噪声。The judgment module 322 may be used to judge whether the speech sequence contains speech noise based on the speech feature.
所述确定模块323,可以用于若包含语音噪声,则基于提取的语音特征,利用所述噪声分类模型确定所述语音噪声对应的噪声类别。The determining module 323 may be configured to determine the noise category corresponding to the voice noise by using the noise classification model based on the extracted voice feature if the voice noise is included.
进一步地,为了构建预设噪声分类模型和多个噪声生成模型,所述装置还包括:聚类单元34和构建单元35。Further, in order to construct a preset noise classification model and multiple noise generation models, the device further includes a clustering unit 34 and a construction unit 35.
所述获取单元31,还可以用于获取预设语音样本库中的真实语音序列以及多个随机语音序列。The acquiring unit 31 may also be used to acquire a real voice sequence and multiple random voice sequences in a preset voice sample library.
所述聚类单元34,可以用于对所述真实语音序列进行聚类处理,得到不同聚类类别下的真实语音序列。The clustering unit 34 may be used to perform clustering processing on the real voice sequence to obtain real voice sequences in different clustering categories.
所述构建单元35,可以用于根据所述多个随机语音序列和所述不同聚类类别下的真实语音序列,构建所述噪声分类模型和所述多个噪声生成模型。The construction unit 35 may be configured to construct the noise classification model and the multiple noise generation models according to the multiple random voice sequences and the real voice sequences in the different clustering categories.
进一步地,为了对真实语音序列进行聚类处理,所述聚类单元34,包括:计算模块341和聚类模块342。Further, in order to perform clustering processing on real speech sequences, the clustering unit 34 includes: a calculation module 341 and a clustering module 342.
所述计算模块341,可以用于根据预设的欧式距离算法计算不同真实语音序列之间的欧式距离。The calculation module 341 may be used to calculate the Euclidean distance between different real speech sequences according to a preset Euclidean distance algorithm.
所述聚类模块342,可以用于基于所述欧式距离,对所述真实语音序列进行聚类处理,得到不同聚类类别下的真实语音序列。The clustering module 342 may be used to perform clustering processing on the real speech sequence based on the Euclidean distance to obtain real speech sequences in different clustering categories.
进一步地,为了构建噪声分类模型和多个噪声生成模型,所述构建单元35,包括:第一构建模块351和第二构建模块352。Further, in order to construct a noise classification model and multiple noise generation models, the construction unit 35 includes: a first construction module 351 and a second construction module 352.
所述第一构建模块351,可以用于分别构建初始噪声分类模型和多个初始噪声生成模型。The first construction module 351 may be used to separately construct an initial noise classification model and multiple initial noise generation models.
所述第二构建模块352,可以用于根据所述多个随机语音序列和所述不同聚类类别下的真实语音序列,对所述初始噪声分类模型和所述多个初始噪声生成模型进行联合迭代训练,构建所述噪声分类模型和所述多个噪声生成模型。The second construction module 352 may be used to combine the initial noise classification model and the multiple initial noise generation models according to the multiple random voice sequences and the real voice sequences in the different clustering categories Iterative training to construct the noise classification model and the multiple noise generation models.
进一步地,所述第二构建模块352,包括:生成子模块、识别子模块、分类子模块和构建子模块。Further, the second construction module 352 includes: a generation sub-module, an identification sub-module, a classification sub-module, and a construction sub-module.
所述生成子模块,可以用于将所述多个随机语音序列分别输入至所述多个初始噪声生成模型,生成不同种类的语音噪声。The generating sub-module may be used to input the multiple random speech sequences into the multiple initial noise generation models to generate different types of speech noise.
所述识别子模块,可以用于将生成的的语音噪声和所述真实语音序列分别输入至所述初始噪声噪声识别模型进行噪声识别,得到初始噪声识别结果。The recognition sub-module may be used to input the generated speech noise and the real speech sequence into the initial noise and noise recognition model to perform noise recognition, and obtain the initial noise recognition result.
所述分类子模块,可以用于提取初始噪声识别结果中语音噪声对应的语音特征,并将其输入至所述初始噪声分类模型进行噪声分类,得到初始噪声分类结果。The classification sub-module can be used to extract the speech features corresponding to the speech noise in the initial noise recognition result, and input it into the initial noise classification model for noise classification, to obtain the initial noise classification result.
所述构建子模块,可以用于基于所述初始噪声识别结果和所述初始噪声分类结果,分别构建噪声识别准确度损失函数和噪声分类准确度损失函数。The construction sub-module may be used to construct a noise recognition accuracy loss function and a noise classification accuracy loss function based on the initial noise recognition result and the initial noise classification result.
所述构建子模块,还可以用于根据噪声识别准确度损失函数和噪声分类准确度损失函数,对所述初始噪声识别模型、所述初始噪声分类模型和所述多个初始噪声生成模型进行联合迭代训练,分别构建噪声识别模型、所述噪声分类模型和所述多个噪声生成模型。The construction sub-module may also be used to combine the initial noise recognition model, the initial noise classification model, and the multiple initial noise generation models according to the noise recognition accuracy loss function and the noise classification accuracy loss function Iterative training to separately construct a noise recognition model, the noise classification model, and the multiple noise generation models.
需要说明的是,本申请实施例提供的一种语音噪声的处理装置所涉及各功能模块的其他相应描述,可以参考图1所示方法的对应描述,在此不再赘述。It should be noted that, for other corresponding descriptions of various functional modules involved in the apparatus for processing speech noise provided by the embodiment of the present application, reference may be made to the corresponding description of the method shown in FIG. 1, which is not repeated here.
基于上述如图1所示方法,相应的,本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质可以是非易失性,也可以是易失性,其上存储有计算机程序,该程序被处理器执行时实现以下步骤:获取待识别语音序列;获取待识别的语音序列;对所述语音序列进行噪音识别,若所述语音序列中包含语音噪声,则利用预设的噪声分类模型确定所述语音噪声对应的噪声类别,其中,所述噪声分类模型是与多个噪声生成模型联合训练得到的,不同噪声生成模型所生成的语音噪声的种类不同;基于所述噪声类别,确定所述语音噪声对应的最优降噪处理策略,并利用所述最优降噪处理策略对所述语音噪声进行降噪处理。Based on the above-mentioned method shown in FIG. 1, correspondingly, an embodiment of the present application also provides a computer-readable storage medium. The computer-readable storage medium may be non-volatile or volatile, and stored thereon. There is a computer program, when the program is executed by the processor, the following steps are realized: obtain the speech sequence to be recognized; obtain the speech sequence to be recognized; A noise classification model is assumed to determine the noise category corresponding to the speech noise, wherein the noise classification model is jointly trained with multiple noise generation models, and the types of speech noise generated by different noise generation models are different; The noise category determines the optimal noise reduction processing strategy corresponding to the speech noise, and uses the optimal noise reduction processing strategy to perform noise reduction processing on the speech noise.
基于上述如图1所示方法和如图3所示装置的实施例,本申请实施例还提供了一种计算机设备的实体结构图,如图5所示,该计算机设备包括:处理器41、存储器42、及存储在存储器42上并可在处理器上运行的计算机程序,其中存储器42和处理器41均设置在总线43上所述处理器41执行所述程序时实现以下步骤:获取待识别的语音序列;对所述语音序列进行噪音识别,若所述语音序列中包含语音噪声,则利用预设的噪声分类模型确定所述语 音噪声对应的噪声类别,其中,所述噪声分类模型是与多个噪声生成模型联合训练得到的,不同噪声生成模型所生成的语音噪声的种类不同;基于所述噪声类别,确定所述语音噪声对应的最优降噪处理策略,并利用所述最优降噪处理策略对所述语音噪声进行降噪处理。Based on the above-mentioned method shown in FIG. 1 and the embodiment of the apparatus shown in FIG. 3, an embodiment of the present application also provides a physical structure diagram of a computer device. As shown in FIG. 5, the computer device includes: a processor 41, The memory 42 and a computer program that is stored on the memory 42 and can run on the processor, wherein the memory 42 and the processor 41 are both set on the bus 43, and the processor 41 implements the following steps when the program is executed: The voice sequence; perform noise recognition on the voice sequence, and if the voice sequence contains voice noise, use a preset noise classification model to determine the noise category corresponding to the voice noise, wherein the noise classification model is Multiple noise generation models are jointly trained, and the types of speech noise generated by different noise generation models are different; based on the noise category, the optimal noise reduction processing strategy corresponding to the speech noise is determined, and the optimal noise reduction is used The noise processing strategy performs noise reduction processing on the speech noise.
通过本申请的技术方案,本申请能获取待识别的语音序列;并对所述语音序列进行噪音识别,若所述语音序列中包含语音噪声,则利用预设的噪声分类模型确定所述语音噪声对应的噪声类别,其中,所述噪声分类模型是与多个噪声生成模型联合训练得到的,不同噪声生成模型所生成的语音噪声的种类不同;与此同时,基于所述噪声类别,确定所述语音噪声对应的最优降噪处理策略,并利用所述最优降噪处理策略对所述语音噪声进行降噪处理,由此通过将噪声分类模型与多个噪声生成模型联合进行训练,从而使得本申请中的噪声分类模型能够对不同场景下语音噪声的种类进行识别,进而能够根据确定的噪声类别,选择最优的降噪处理策略对语音噪声进行处理,能够达到最优的降噪处理效果。Through the technical solution of the present application, the present application can obtain the voice sequence to be recognized; perform noise recognition on the voice sequence, and if the voice sequence contains voice noise, use a preset noise classification model to determine the voice noise The corresponding noise category, wherein the noise classification model is jointly trained with multiple noise generation models, and the types of speech noise generated by different noise generation models are different; at the same time, based on the noise category, the The optimal noise reduction processing strategy corresponding to the speech noise, and the optimal noise reduction processing strategy is used to reduce the noise of the speech noise, so that the noise classification model and multiple noise generation models are jointly trained, so that The noise classification model in this application can identify the types of speech noise in different scenarios, and then can select the optimal noise reduction processing strategy to process the speech noise according to the determined noise category, and can achieve the optimal noise reduction processing effect .

Claims (20)

  1. 一种语音噪声的处理方法,包括:A method for processing speech noise, including:
    获取待识别的语音序列;Obtain the voice sequence to be recognized;
    对所述语音序列进行噪音识别,若所述语音序列中包含语音噪声,则利用预设的噪声分类模型确定所述语音噪声对应的噪声类别,其中,所述噪声分类模型是与多个噪声生成模型联合训练得到的,不同噪声生成模型所生成的语音噪声的种类不同;Perform noise recognition on the voice sequence. If the voice sequence contains voice noise, use a preset noise classification model to determine the noise category corresponding to the voice noise, wherein the noise classification model is generated from multiple noises. The types of speech noise generated by different noise generation models are different when the models are jointly trained;
    基于所述噪声类别,确定所述语音噪声对应的最优降噪处理策略,并利用所述最优降噪处理策略对所述语音噪声进行降噪处理。Based on the noise category, an optimal noise reduction processing strategy corresponding to the speech noise is determined, and the optimal noise reduction processing strategy is used to perform noise reduction processing on the speech noise.
  2. 根据权利要求1所述的方法,其中,所述若所述语音序列中包含语音噪声,则利用预设的噪声分类模型确定所述语音噪声对应的噪声类别,包括:The method according to claim 1, wherein if the speech sequence contains speech noise, determining the noise category corresponding to the speech noise by using a preset noise classification model comprises:
    对所述语音序列进行语音特征提取,得到所述语音序列对应的语音特征;Performing voice feature extraction on the voice sequence to obtain voice features corresponding to the voice sequence;
    基于所述语音特征,判断所述语音序列中是否包含语音噪声;Based on the voice feature, determine whether the voice sequence contains voice noise;
    若包含语音噪声,则基于提取的语音特征,利用所述噪声分类模型确定所述语音噪声对应的噪声类别。If voice noise is included, the noise classification model is used to determine the noise category corresponding to the voice noise based on the extracted voice features.
  3. 根据权利要求1所述的方法,其中,在所述获取待识别的语音序列之前,所述方法还包括:The method according to claim 1, wherein, before the obtaining the speech sequence to be recognized, the method further comprises:
    获取预设语音样本库中的真实语音序列以及多个随机语音序列;Obtain the real voice sequence and multiple random voice sequences in the preset voice sample library;
    对所述真实语音序列进行聚类处理,得到不同聚类类别下的真实语音序列;Performing clustering processing on the real voice sequence to obtain real voice sequences in different clustering categories;
    根据所述多个随机语音序列和所述不同聚类类别下的真实语音序列,构建所述噪声分类模型和所述多个噪声生成模型。The noise classification model and the multiple noise generation models are constructed according to the multiple random voice sequences and the real voice sequences in the different clustering categories.
  4. 根据权利要求3所述的方法,其中,所述对所述真实语音序列进行聚类处理,得到不同聚类类别下的真实语音序列,包括:The method according to claim 3, wherein said performing clustering processing on said real speech sequences to obtain real speech sequences in different clustering categories comprises:
    根据预设的欧式距离算法计算不同真实语音序列之间的欧式距离;Calculate the Euclidean distance between different real speech sequences according to the preset Euclidean distance algorithm;
    基于所述欧式距离,对所述真实语音序列进行聚类处理,得到不同聚类类别下的真实语音序列。Based on the Euclidean distance, clustering is performed on the real speech sequence to obtain real speech sequences in different clustering categories.
  5. 根据权利要求3所述的方法,其中,所述根据所述多个随机语音序列和所述不同聚类类别下的真实语音序列,建预所述噪声分类模型和所述多个噪声生成模型,包括:3. The method according to claim 3, wherein said pre-establishing said noise classification model and said multiple noise generation models according to said multiple random voice sequences and real voice sequences in said different clustering categories, include:
    分别构建初始噪声分类模型和多个初始噪声生成模型;Build an initial noise classification model and multiple initial noise generation models respectively;
    根据所述多个随机语音序列和所述不同聚类类别下的真实语音序列,对所述初始噪声分类模型和所述多个初始噪声生成模型进行联合迭代训练,构建所述噪声分类模型和所述多个噪声生成模型。According to the multiple random voice sequences and the real voice sequences in the different clustering categories, the initial noise classification model and the multiple initial noise generation models are jointly iteratively trained to construct the noise classification model and the Describe multiple noise generation models.
  6. 根据权利要求5所述的方法,其中,所述分别构建初始噪声分类模型和多个初始噪声生成模型,包括:The method according to claim 5, wherein said separately constructing an initial noise classification model and a plurality of initial noise generation models comprises:
    分别构建初始噪声识别模型,初始噪声分类模型和多个初始噪声生成模型;Build the initial noise recognition model, initial noise classification model and multiple initial noise generation models respectively;
    所述根据所述多个随机语音序列和所述不同聚类类别下的真实语音序列,对所述初始噪声分类模型和所述多个初始噪声生成模型进行联合迭代训练,构建所述噪声分类模型和所述多个噪声生成模型,包括:Said performing joint iterative training on the initial noise classification model and the multiple initial noise generation models according to the multiple random speech sequences and the real speech sequences under the different clustering categories to construct the noise classification model And the multiple noise generation models, including:
    将所述多个随机语音序列分别输入至所述多个初始噪声生成模型,生 成不同种类的语音噪声;Input the plurality of random speech sequences into the plurality of initial noise generation models respectively to generate different types of speech noise;
    将生成的的语音噪声和所述真实语音序列分别输入至所述初始噪声噪声识别模型进行噪声识别,得到初始噪声识别结果;Respectively input the generated speech noise and the real speech sequence into the initial noise and noise recognition model for noise recognition, and obtain an initial noise recognition result;
    提取初始噪声识别结果中语音噪声对应的语音特征,并将其输入至所述初始噪声分类模型进行噪声分类,得到初始噪声分类结果;Extracting the speech features corresponding to the speech noise in the initial noise recognition result, and inputting it into the initial noise classification model for noise classification, to obtain an initial noise classification result;
    基于所述初始噪声识别结果和所述初始噪声分类结果,分别构建噪声识别准确度损失函数和噪声分类准确度损失函数;Based on the initial noise recognition result and the initial noise classification result, respectively constructing a noise recognition accuracy loss function and a noise classification accuracy loss function;
    根据噪声识别准确度损失函数和噪声分类准确度损失函数,对所述初始噪声识别模型、所述初始噪声分类模型和所述多个初始噪声生成模型进行联合迭代训练,分别构建噪声识别模型、所述噪声分类模型和所述多个噪声生成模型。According to the noise recognition accuracy loss function and the noise classification accuracy loss function, the initial noise recognition model, the initial noise classification model, and the multiple initial noise generation models are jointly and iteratively trained to construct the noise recognition model and the noise generation model respectively. The noise classification model and the multiple noise generation models.
  7. 根据权利要求3-6任一项所述的方法,其中,所述多个随机语音序列服从高斯分布。The method according to any one of claims 3-6, wherein the plurality of random speech sequences obey a Gaussian distribution.
  8. 一种语音噪声的处理装置,包括:A processing device for speech noise, including:
    获取单元,用于获取待识别的语音序列;The acquiring unit is used to acquire the voice sequence to be recognized;
    确定单元,用于对所述语音序列进行噪声识别,若所述语音序列中包含语音噪声,则利用预设的噪声分类模型确定所述语音噪声对应的噪声类别,其中,所述噪声分类模型是与多个噪声生成模型联合训练得到的,不同噪声生成模型所生成的语音噪声的种类不同;The determining unit is configured to perform noise recognition on the voice sequence, and if the voice sequence contains voice noise, use a preset noise classification model to determine the noise category corresponding to the voice noise, wherein the noise classification model is It is obtained by joint training with multiple noise generation models, and the types of speech noise generated by different noise generation models are different;
    降噪单元,用于基于所述噪声类别,确定所述语音噪声对应的最优降噪处理策略,并利用所述最优降噪处理策略对所述语音噪声进行降噪处理。The noise reduction unit is configured to determine an optimal noise reduction processing strategy corresponding to the speech noise based on the noise category, and use the optimal noise reduction processing strategy to perform noise reduction processing on the speech noise.
  9. 一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现一种语音噪声的处理方法的步骤:A computer-readable storage medium with a computer program stored thereon, and when the computer program is executed by a processor, the steps of a method for processing speech noise are realized:
    获取待识别的语音序列;Obtain the voice sequence to be recognized;
    对所述语音序列进行噪音识别,若所述语音序列中包含语音噪声,则利用预设的噪声分类模型确定所述语音噪声对应的噪声类别,其中,所述噪声分类模型是与多个噪声生成模型联合训练得到的,不同噪声生成模型所生成的语音噪声的种类不同;Perform noise recognition on the voice sequence. If the voice sequence contains voice noise, use a preset noise classification model to determine the noise category corresponding to the voice noise, wherein the noise classification model is generated from multiple noises. The types of speech noise generated by different noise generation models are different when the models are jointly trained;
    基于所述噪声类别,确定所述语音噪声对应的最优降噪处理策略,并利用所述最优降噪处理策略对所述语音噪声进行降噪处理。Based on the noise category, an optimal noise reduction processing strategy corresponding to the speech noise is determined, and the optimal noise reduction processing strategy is used to perform noise reduction processing on the speech noise.
  10. 根据权利要求9所述的计算机可读存储介质,其中,所述若所述语音序列中包含语音噪声,则利用预设的噪声分类模型确定所述语音噪声对应的噪声类别,包括:9. The computer-readable storage medium according to claim 9, wherein if the speech sequence contains speech noise, using a preset noise classification model to determine the noise category corresponding to the speech noise comprises:
    对所述语音序列进行语音特征提取,得到所述语音序列对应的语音特征;Performing voice feature extraction on the voice sequence to obtain voice features corresponding to the voice sequence;
    基于所述语音特征,判断所述语音序列中是否包含语音噪声;Based on the voice feature, determine whether the voice sequence contains voice noise;
    若包含语音噪声,则基于提取的语音特征,利用所述噪声分类模型确定所述语音噪声对应的噪声类别。If voice noise is included, the noise classification model is used to determine the noise category corresponding to the voice noise based on the extracted voice features.
  11. 根据权利要求9所述的计算机可读存储介质,其中,在所述获取待识别的语音序列之前,所述方法还包括:9. The computer-readable storage medium according to claim 9, wherein, before said obtaining the speech sequence to be recognized, the method further comprises:
    获取预设语音样本库中的真实语音序列以及多个随机语音序列;Obtain the real voice sequence and multiple random voice sequences in the preset voice sample library;
    对所述真实语音序列进行聚类处理,得到不同聚类类别下的真实语音序列;Performing clustering processing on the real voice sequence to obtain real voice sequences in different clustering categories;
    根据所述多个随机语音序列和所述不同聚类类别下的真实语音序列,构建所述噪声分类模型和所述多个噪声生成模型。The noise classification model and the multiple noise generation models are constructed according to the multiple random voice sequences and the real voice sequences in the different clustering categories.
  12. 根据权利要求11所述的计算机可读存储介质,其中,所述对所述真实语音序列进行聚类处理,得到不同聚类类别下的真实语音序列,包括:11. The computer-readable storage medium according to claim 11, wherein said performing clustering processing on said real speech sequences to obtain real speech sequences in different clustering categories comprises:
    根据预设的欧式距离算法计算不同真实语音序列之间的欧式距离;基于所述欧式距离,对所述真实语音序列进行聚类处理,得到不同聚类类别下的真实语音序列。The Euclidean distance between different real speech sequences is calculated according to a preset Euclidean distance algorithm; based on the Euclidean distance, the real speech sequence is clustered to obtain real speech sequences in different clustering categories.
  13. 根据权利要求11所述的计算机可读存储介质,其中,所述根据所述多个随机语音序列和所述不同聚类类别下的真实语音序列,建预所述噪声分类模型和所述多个噪声生成模型,包括:11. The computer-readable storage medium according to claim 11, wherein the noise classification model and the multiple Noise generation models, including:
    分别构建初始噪声分类模型和多个初始噪声生成模型;Build an initial noise classification model and multiple initial noise generation models respectively;
    根据所述多个随机语音序列和所述不同聚类类别下的真实语音序列,对所述初始噪声分类模型和所述多个初始噪声生成模型进行联合迭代训练,构建所述噪声分类模型和所述多个噪声生成模型。According to the multiple random voice sequences and the real voice sequences in the different clustering categories, the initial noise classification model and the multiple initial noise generation models are jointly iteratively trained to construct the noise classification model and the Describe multiple noise generation models.
  14. 根据权利要求13所述的计算机可读存储介质,其中,所述分别构建初始噪声分类模型和多个初始噪声生成模型,包括:The computer-readable storage medium according to claim 13, wherein said separately constructing an initial noise classification model and a plurality of initial noise generation models comprises:
    分别构建初始噪声识别模型,初始噪声分类模型和多个初始噪声生成模型;Build the initial noise recognition model, initial noise classification model and multiple initial noise generation models respectively;
    所述根据所述多个随机语音序列和所述不同聚类类别下的真实语音序列,对所述初始噪声分类模型和所述多个初始噪声生成模型进行联合迭代训练,构建所述噪声分类模型和所述多个噪声生成模型,包括:Said performing joint iterative training on the initial noise classification model and the multiple initial noise generation models according to the multiple random speech sequences and the real speech sequences under the different clustering categories to construct the noise classification model And the multiple noise generation models, including:
    将所述多个随机语音序列分别输入至所述多个初始噪声生成模型,生成不同种类的语音噪声;Input the plurality of random speech sequences into the plurality of initial noise generation models respectively to generate different types of speech noise;
    将生成的的语音噪声和所述真实语音序列分别输入至所述初始噪声噪声识别模型进行噪声识别,得到初始噪声识别结果;Respectively input the generated speech noise and the real speech sequence into the initial noise and noise recognition model for noise recognition, and obtain an initial noise recognition result;
    提取初始噪声识别结果中语音噪声对应的语音特征,并将其输入至所述初始噪声分类模型进行噪声分类,得到初始噪声分类结果;Extracting the speech features corresponding to the speech noise in the initial noise recognition result, and inputting it into the initial noise classification model for noise classification, to obtain an initial noise classification result;
    基于所述初始噪声识别结果和所述初始噪声分类结果,分别构建噪声识别准确度损失函数和噪声分类准确度损失函数;Based on the initial noise recognition result and the initial noise classification result, respectively constructing a noise recognition accuracy loss function and a noise classification accuracy loss function;
    根据噪声识别准确度损失函数和噪声分类准确度损失函数,对所述初始噪声识别模型、所述初始噪声分类模型和所述多个初始噪声生成模型进行联合迭代训练,分别构建噪声识别模型、所述噪声分类模型和所述多个噪声生成模型。According to the noise recognition accuracy loss function and the noise classification accuracy loss function, the initial noise recognition model, the initial noise classification model, and the multiple initial noise generation models are jointly and iteratively trained to construct the noise recognition model and the noise generation model respectively. The noise classification model and the multiple noise generation models.
  15. 根据权利要求11-14任一项所述的计算机可读存储介质,其中,所述多个随机语音序列服从高斯分布。14. The computer-readable storage medium according to any one of claims 11-14, wherein the plurality of random speech sequences obey a Gaussian distribution.
  16. 一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述计算机程序被处理器执行时实现一种语音噪声的处理方法的步骤:A computer device includes a memory, a processor, and a computer program stored on the memory and capable of running on the processor, and when the computer program is executed by the processor, the steps of a method for processing speech noise are realized:
    获取待识别的语音序列;Obtain the voice sequence to be recognized;
    对所述语音序列进行噪音识别,若所述语音序列中包含语音噪声,则利用预设的噪声分类模型确定所述语音噪声对应的噪声类别,其中,所述噪声分类模型是与多个噪声生成模型联合训练得到的,不同噪声生成模型所生成的语音噪声的种类不同;Perform noise recognition on the voice sequence. If the voice sequence contains voice noise, use a preset noise classification model to determine the noise category corresponding to the voice noise, wherein the noise classification model is generated from multiple noises. The types of speech noise generated by different noise generation models are different when the models are jointly trained;
    基于所述噪声类别,确定所述语音噪声对应的最优降噪处理策略,并利用所述最优降噪处理策略对所述语音噪声进行降噪处理。Based on the noise category, an optimal noise reduction processing strategy corresponding to the speech noise is determined, and the optimal noise reduction processing strategy is used to perform noise reduction processing on the speech noise.
  17. 根据权利要求16所述的计算机设备,其中,所述若所述语音序列中包含语音噪声,则利用预设的噪声分类模型确定所述语音噪声对应的噪声类别,包括:The computer device according to claim 16, wherein if the speech sequence contains speech noise, using a preset noise classification model to determine the noise category corresponding to the speech noise comprises:
    对所述语音序列进行语音特征提取,得到所述语音序列对应的语音特征;Performing voice feature extraction on the voice sequence to obtain voice features corresponding to the voice sequence;
    基于所述语音特征,判断所述语音序列中是否包含语音噪声;Based on the voice feature, determine whether the voice sequence contains voice noise;
    若包含语音噪声,则基于提取的语音特征,利用所述噪声分类模型确定所述语音噪声对应的噪声类别。If voice noise is included, the noise classification model is used to determine the noise category corresponding to the voice noise based on the extracted voice features.
  18. 根据权利要求16所述的计算机设备,其中,在所述获取待识别的语音序列之前,所述方法还包括:The computer device according to claim 16, wherein, before said obtaining the speech sequence to be recognized, the method further comprises:
    获取预设语音样本库中的真实语音序列以及多个随机语音序列;Obtain the real voice sequence and multiple random voice sequences in the preset voice sample library;
    对所述真实语音序列进行聚类处理,得到不同聚类类别下的真实语音序列;Performing clustering processing on the real voice sequence to obtain real voice sequences in different clustering categories;
    根据所述多个随机语音序列和所述不同聚类类别下的真实语音序列,构建所述噪声分类模型和所述多个噪声生成模型。The noise classification model and the multiple noise generation models are constructed according to the multiple random voice sequences and the real voice sequences in the different clustering categories.
  19. 根据权利要求18所述的计算机设备,其中,所述对所述真实语音序列进行聚类处理,得到不同聚类类别下的真实语音序列,包括:根据预设的欧式距离算法计算不同真实语音序列之间的欧式距离;基于所述欧式距离,对所述真实语音序列进行聚类处理,得到不同聚类类别下的真实语音序列。18. The computer device according to claim 18, wherein said performing clustering processing on said real speech sequences to obtain real speech sequences in different clustering categories comprises: calculating different real speech sequences according to a preset Euclidean distance algorithm Euclidean distance between; based on the Euclidean distance, clustering the real speech sequence to obtain real speech sequences in different clustering categories.
  20. 根据权利要求18所述的计算机设备,其中,所述根据所述多个随机语音序列和所述不同聚类类别下的真实语音序列,建预所述噪声分类模型和所述多个噪声生成模型,包括:18. The computer device of claim 18, wherein the noise classification model and the multiple noise generation models are pre-built based on the multiple random voice sequences and real voice sequences in the different clustering categories ,include:
    分别构建初始噪声分类模型和多个初始噪声生成模型;Build an initial noise classification model and multiple initial noise generation models respectively;
    根据所述多个随机语音序列和所述不同聚类类别下的真实语音序列,对所述初始噪声分类模型和所述多个初始噪声生成模型进行联合迭代训练,构建所述噪声分类模型和所述多个噪声生成模型。According to the multiple random voice sequences and the real voice sequences in the different clustering categories, the initial noise classification model and the multiple initial noise generation models are jointly iteratively trained to construct the noise classification model and the Describe multiple noise generation models.
PCT/CN2020/136367 2020-10-26 2020-12-15 Voice noise processing method and apparatus, and computer device and storage medium WO2021189981A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011153509.1A CN112201270B (en) 2020-10-26 2020-10-26 Voice noise processing method and device, computer equipment and storage medium
CN202011153509.1 2020-10-26

Publications (1)

Publication Number Publication Date
WO2021189981A1 true WO2021189981A1 (en) 2021-09-30

Family

ID=74011358

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/136367 WO2021189981A1 (en) 2020-10-26 2020-12-15 Voice noise processing method and apparatus, and computer device and storage medium

Country Status (2)

Country Link
CN (1) CN112201270B (en)
WO (1) WO2021189981A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101710490A (en) * 2009-11-20 2010-05-19 安徽科大讯飞信息科技股份有限公司 Method and device for compensating noise for voice assessment
US20120185246A1 (en) * 2011-01-19 2012-07-19 Broadcom Corporation Noise suppression using multiple sensors of a communication device
CN102693724A (en) * 2011-03-22 2012-09-26 张燕 Noise classification method of Gaussian Mixture Model based on neural network
CN103065631A (en) * 2013-01-24 2013-04-24 华为终端有限公司 Voice identification method and device
CN103219011A (en) * 2012-01-18 2013-07-24 联想移动通信科技有限公司 Noise reduction method, noise reduction device and communication terminal
CN104575510A (en) * 2015-02-04 2015-04-29 深圳酷派技术有限公司 Noise reduction method, noise reduction device and terminal

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1658282A (en) * 1997-12-24 2005-08-24 三菱电机株式会社 Method for speech coding, method for speech decoding and their apparatuses
JP4033299B2 (en) * 2003-03-12 2008-01-16 株式会社エヌ・ティ・ティ・ドコモ Noise model noise adaptation system, noise adaptation method, and speech recognition noise adaptation program
EP1732063A4 (en) * 2004-03-31 2007-07-04 Pioneer Corp Speech recognition device and speech recognition method
US9313585B2 (en) * 2008-12-22 2016-04-12 Oticon A/S Method of operating a hearing instrument based on an estimation of present cognitive load of a user and a hearing aid system
CN109471853B (en) * 2018-09-18 2023-06-16 平安科技(深圳)有限公司 Data noise reduction method, device, computer equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101710490A (en) * 2009-11-20 2010-05-19 安徽科大讯飞信息科技股份有限公司 Method and device for compensating noise for voice assessment
US20120185246A1 (en) * 2011-01-19 2012-07-19 Broadcom Corporation Noise suppression using multiple sensors of a communication device
CN102693724A (en) * 2011-03-22 2012-09-26 张燕 Noise classification method of Gaussian Mixture Model based on neural network
CN103219011A (en) * 2012-01-18 2013-07-24 联想移动通信科技有限公司 Noise reduction method, noise reduction device and communication terminal
CN103065631A (en) * 2013-01-24 2013-04-24 华为终端有限公司 Voice identification method and device
CN104575510A (en) * 2015-02-04 2015-04-29 深圳酷派技术有限公司 Noise reduction method, noise reduction device and terminal

Also Published As

Publication number Publication date
CN112201270B (en) 2023-05-23
CN112201270A (en) 2021-01-08

Similar Documents

Publication Publication Date Title
US10776470B2 (en) Verifying identity based on facial dynamics
US11281945B1 (en) Multimodal dimensional emotion recognition method
WO2018176894A1 (en) Speaker confirmation method and device
EP3791392A1 (en) Joint neural network for speaker recognition
WO2019015466A1 (en) Method and apparatus for verifying person and certificate
CN110269625B (en) Novel multi-feature fusion electrocardio authentication method and system
CN111723679A (en) Face and voiceprint authentication system and method based on deep migration learning
Ilyas et al. AVFakeNet: A unified end-to-end Dense Swin Transformer deep learning model for audio–visual​ deepfakes detection
Wimmer et al. Low-level fusion of audio and video feature for multi-modal emotion recognition
CN104715753B (en) A kind of method and electronic equipment of data processing
CN108922543B (en) Model base establishing method, voice recognition method, device, equipment and medium
CN113361636B (en) Image classification method, system, medium and electronic device
CN111108508B (en) Face emotion recognition method, intelligent device and computer readable storage medium
WO2022166797A1 (en) Image generation model training method, generation method, apparatus, and device
CN112001215A (en) Method for identifying identity of text-independent speaker based on three-dimensional lip movement
US11711363B2 (en) Systems for authenticating digital contents
CN113516990A (en) Voice enhancement method, method for training neural network and related equipment
CN111553899A (en) Audio and video based Parkinson non-contact intelligent detection method and system
WO2021189979A1 (en) Speech enhancement method and apparatus, computer device, and storage medium
CN113948105A (en) Voice-based image generation method, device, equipment and medium
WO2021189981A1 (en) Voice noise processing method and apparatus, and computer device and storage medium
Liu et al. Golden Gemini is All You Need: Finding the Sweet Spots for Speaker Verification
CN113066024B (en) Training method of image blur detection model, image blur detection method and device
CN113205030A (en) Pedestrian re-identification method for defending antagonistic attack
Usoltsev et al. Full video processing for mobile audio-visual identity verification

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20926754

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20926754

Country of ref document: EP

Kind code of ref document: A1