CN116597855A - Adaptive noise reduction method and device and computer equipment - Google Patents

Adaptive noise reduction method and device and computer equipment Download PDF

Info

Publication number
CN116597855A
CN116597855A CN202310877853.2A CN202310877853A CN116597855A CN 116597855 A CN116597855 A CN 116597855A CN 202310877853 A CN202310877853 A CN 202310877853A CN 116597855 A CN116597855 A CN 116597855A
Authority
CN
China
Prior art keywords
model
noise reduction
classification
target
coding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310877853.2A
Other languages
Chinese (zh)
Other versions
CN116597855B (en
Inventor
薛兴韩
林宗华
高桂冰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Zecheng Electronics Co ltd
Original Assignee
Shenzhen Zecheng Electronics Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Zecheng Electronics Co ltd filed Critical Shenzhen Zecheng Electronics Co ltd
Priority to CN202310877853.2A priority Critical patent/CN116597855B/en
Publication of CN116597855A publication Critical patent/CN116597855A/en
Application granted granted Critical
Publication of CN116597855B publication Critical patent/CN116597855B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T90/00Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation

Abstract

The invention provides a self-adaptive noise reduction method, a device and computer equipment, which comprise the following steps: acquiring voice data; wherein, the voice data at least carries background sound information; analyzing the voice data to obtain the background sound information; inputting the background sound information into a classification model for classification to obtain a corresponding classification result; the classification model is a neural network model which is trained in advance; acquiring identification information corresponding to the classification result; detecting whether a target identification field which is the same as the identification information exists in a database; if yes, determining a corresponding target noise reduction model according to the target identification field; and carrying out noise reduction processing on the voice data based on the target noise reduction model. The invention determines the corresponding target noise reduction model to perform noise reduction based on different backgrounds in the voice data, and overcomes the defect that the current intelligent voice equipment cannot adapt to the background to perform noise reduction.

Description

Adaptive noise reduction method and device and computer equipment
Technical Field
The present invention relates to the field of speech noise reduction technology, and in particular, to a method, an apparatus, and a computer device for adaptive noise reduction.
Background
Currently, in intelligent voice equipment, the intelligent voice equipment has a voice acquisition function; the intelligent voice equipment collects various noises to interfere the voice of a user while collecting the voice; in order to remove noise, the intelligent voice equipment can perform some noise reduction processing, the current noise reduction processing mode is relatively fixed, and a good noise reduction effect can be obtained only in part of the background, but the effect is poor in other backgrounds; under the condition of poor noise reduction effect, the voice acquisition effect of the intelligent voice equipment is poor.
Disclosure of Invention
The invention mainly aims to provide a self-adaptive noise reduction method, a self-adaptive noise reduction device and computer equipment, and aims to solve the defect that the existing intelligent voice equipment cannot adapt to the background to carry out noise reduction treatment.
In order to achieve the above object, the present invention provides a method for adaptive noise reduction, comprising the steps of:
acquiring voice data; wherein, the voice data at least carries background sound information;
analyzing the voice data to obtain the background sound information;
inputting the background sound information into a classification model for classification to obtain a corresponding classification result; the classification model is a neural network model which is trained in advance;
acquiring identification information corresponding to the classification result;
detecting whether a target identification field which is the same as the identification information exists in a database; if yes, determining a corresponding target noise reduction model according to the target identification field;
and carrying out noise reduction processing on the voice data based on the target noise reduction model.
Further, the step of determining the corresponding target noise reduction model according to the target identification field includes:
analyzing the target identification field to obtain first characteristic information and second characteristic information; the first characteristic information is character information at a first appointed position in the target identification field, and the second characteristic information is character information at a second appointed position in the target identification field;
determining a target noise reduction model corresponding to the first characteristic information based on the corresponding relation between the characteristic information stored in the database and the noise reduction model;
sending a calling instruction carrying the second characteristic information to a management terminal; the calling instruction is used for calling the determined target noise reduction model.
Further, the step of determining the corresponding target noise reduction model according to the target identification field includes:
matching a target model parameter set corresponding to the target identification field based on the corresponding relation between the identification field stored in the database and the model parameter set; the target model parameter set comprises a plurality of model parameters, wherein the model parameters at least comprise a smooth queue length and a smooth mechanism parameter;
randomly calling a noise reduction model, and correspondingly replacing model parameters in the noise reduction model with model parameters in the target model parameter set to obtain the target noise reduction model.
Further, the training process of the classification model includes:
acquiring an initial neural network model and training data; the training data are background sound training data and corresponding labels; the initial neural network model comprises a feature extraction layer, an encoding layer, a decoding layer and a classification layer;
performing feature extraction on the background sound training data based on the feature extraction layer to obtain a first sound feature;
inputting the first sound characteristic into the coding layer for coding to obtain a first coding characteristic; inputting the first coding feature into the decoding layer for decoding to obtain a second coding feature;
inputting the second sound characteristic into the coding layer for coding to obtain a second coding characteristic;
and inputting the first coding feature, the second coding feature and the label corresponding to the background sound training data to the classification layer together for iterative training until the model converges to obtain the classification model.
Further, the training process of the classification model includes:
acquiring a first neural network model and a second neural network model, and acquiring training data; the training data are background sound training data and corresponding labels; the first neural network model comprises a feature extraction layer, a coding layer, a decoding layer and a classification layer, and the second neural network model comprises a feature extraction layer, a coding layer and a classification layer;
performing feature extraction on the background sound training data based on a feature extraction layer of the first neural network model to obtain a first sound feature;
inputting the first sound characteristic to a coding layer of the first neural network model for coding to obtain a first coding characteristic; inputting the first coding feature into the decoding layer for decoding to obtain a second coding feature;
inputting the second sound characteristic to a coding layer of the second neural network model for coding to obtain a second coding characteristic;
inputting the first coding feature and the label corresponding to the background sound training data into a classification layer of the first neural network model for iterative training until the model converges to obtain a first classification model;
inputting the second coding feature and the label corresponding to the background sound training data into a classification layer of the second neural network model for iterative training until the model converges to obtain a second classification model;
inputting the test set into the first classification model and the second classification model respectively for classification to obtain a first classification result and a second classification result;
judging whether the first classification result and the second classification result are the same as the labels in the test set; and if the first classification models are the same, taking the first classification model as the classification model.
Further, after the step of performing noise reduction processing on the voice data based on the target noise reduction model, the method includes:
acquiring noise-reduced voice data;
performing character recognition on the noise-reduced voice data to obtain keywords;
acquiring multi-frame voice data corresponding to the key words from the voice data;
inputting the keywords into a word embedding model, and extracting word vectors corresponding to the keywords; sequentially inputting the multi-frame voice data corresponding to the keywords into a preset neural network, and extracting vectors corresponding to each frame of voice data;
summing vectors corresponding to each frame of voice data to obtain a sum vector;
adjusting network parameters of the preset neural network, and fitting the sum vector and the word vector through a cosine function to train the preset neural network; when the sum vector and the word vector are completely matched, a keyword voice recognition model is obtained; the keyword voice recognition model is used for carrying out keyword recognition on voice information.
The invention also provides a self-adaptive noise reduction device, which comprises:
a first acquisition unit configured to acquire voice data; wherein, the voice data at least carries background sound information;
the analysis unit is used for analyzing the voice data to obtain the background sound information;
the classification unit is used for inputting the background sound information into a classification model for classification to obtain a corresponding classification result; the classification model is a neural network model which is trained in advance;
the second acquisition unit is used for acquiring the identification information corresponding to the classification result;
a determining unit, configured to detect whether a target identification field identical to the identification information exists in a database; if yes, determining a corresponding target noise reduction model according to the target identification field;
and the noise reduction unit is used for carrying out noise reduction processing on the voice data based on the target noise reduction model.
Further, the determining unit includes:
the analysis subunit is used for analyzing the target identification field to obtain first characteristic information and second characteristic information; the first characteristic information is character information at a first appointed position in the target identification field, and the second characteristic information is character information at a second appointed position in the target identification field;
the determining subunit is used for determining a target noise reduction model corresponding to the first characteristic information based on the corresponding relation between the characteristic information stored in the database and the noise reduction model;
a calling subunit, configured to send a calling instruction carrying the second feature information to a management terminal; the calling instruction is used for calling the determined target noise reduction model.
Further, the determining unit includes:
the matching subunit is used for matching the target model parameter set corresponding to the target identification field based on the corresponding relation between the identification field stored in the database and the model parameter set; the target model parameter set comprises a plurality of model parameters, wherein the model parameters at least comprise a smooth queue length and a smooth mechanism parameter;
and the replacing subunit is used for randomly calling a noise reduction model, and correspondingly replacing model parameters in the noise reduction model with model parameters in the target model parameter set to obtain the target noise reduction model.
The invention also provides a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of any of the methods described above when the computer program is executed.
The invention also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method of any of the preceding claims.
The invention provides a self-adaptive noise reduction method, a device and computer equipment, which comprise the following steps: acquiring voice data; wherein, the voice data at least carries background sound information; analyzing the voice data to obtain the background sound information; inputting the background sound information into a classification model for classification to obtain a corresponding classification result; the classification model is a neural network model which is trained in advance; acquiring identification information corresponding to the classification result; detecting whether a target identification field which is the same as the identification information exists in a database; if yes, determining a corresponding target noise reduction model according to the target identification field; and carrying out noise reduction processing on the voice data based on the target noise reduction model. The invention determines the corresponding target noise reduction model to perform noise reduction based on different backgrounds in the voice data, and overcomes the defect that the current intelligent voice equipment cannot adapt to the background to perform noise reduction.
Drawings
FIG. 1 is a schematic diagram of steps of a method for adaptive noise reduction according to an embodiment of the present invention;
FIG. 2 is a block diagram of an adaptive noise reduction apparatus according to an embodiment of the present invention;
fig. 3 is a block diagram schematically illustrating a structure of a computer device according to an embodiment of the present invention.
The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Referring to fig. 1, in one embodiment of the present invention, there is provided a method for adaptive noise reduction, including the steps of:
step S1, voice data are obtained; wherein, the voice data at least carries background sound information;
s2, analyzing the voice data to obtain the background sound information;
s3, inputting the background sound information into a classification model for classification to obtain a corresponding classification result; the classification model is a neural network model which is trained in advance;
s4, obtaining identification information corresponding to the classification result;
step S5, detecting whether a target identification field which is the same as the identification information exists in a database; if yes, determining a corresponding target noise reduction model according to the target identification field;
and S6, carrying out noise reduction processing on the voice data based on the target noise reduction model.
In this embodiment, the above scheme is applied to an intelligent voice device, and adaptively determines a corresponding noise reduction model to perform noise reduction according to different backgrounds of voice data. As described in step S1, the voice data to be noise-reduced is obtained, where the voice data is the sound made by the user, and the voice data at least carries the background sound information of the background where the current user is located, and may also carry the sound of the user. The background where the current user is located, such as in a quiet room, on the street, in a park, etc., can be determined based on the background sound information. As described in the above steps S2-S3, the voice data is parsed, and the voice data is decomposed, so that the background sound information and the user sound can be obtained. Furthermore, in this embodiment, a neural network model is trained in advance to obtain a classification model, and the background sound information can be classified based on the classification model, where the classification refers to classification of the background, that is, the background where the current user is located is specifically in a quiet room, on a street, or in a park. As described in step S4, in this embodiment, the correspondence between the classification result and the identification information is preset, according to which the identification information corresponding to the classification result of the background can be obtained, and the identification information is used to replace the classification result to perform subsequent processing, so that the data processing amount can be reduced appropriately, and the identification is convenient. As described in the above steps S5-S6, a plurality of identification fields are stored in advance in the database, and whether the target identification field identical to the identification information exists in the database is detected; if the noise is not present, adopting a predetermined general noise reduction model to carry out subsequent noise reduction; if the target identification field exists, a corresponding target noise reduction model is determined according to the target identification field, and noise reduction processing is carried out on the voice data based on the target noise reduction model. In this embodiment, based on the above scheme, the noise reduction processing is performed by determining the corresponding target noise reduction model based on the difference of the backgrounds in the voice data, and the defect that the current intelligent voice equipment cannot adapt to the background to perform the noise reduction processing is overcome.
In an embodiment, the step S5 of determining the corresponding target noise reduction model according to the target identification field includes:
step S51, analyzing the target identification field to obtain first characteristic information and second characteristic information; the first characteristic information is character information at a first appointed position in the target identification field, and the second characteristic information is character information at a second appointed position in the target identification field;
step S52, determining a target noise reduction model corresponding to the first characteristic information based on the corresponding relation between the characteristic information stored in the database and the noise reduction model;
step S53, a calling instruction carrying the second characteristic information is sent to a management terminal; the calling instruction is used for calling the determined target noise reduction model.
In this embodiment, as described in step S51, when the target identifier field includes a plurality of characters and the characters at different positions are combined, different feature information is obtained. The first characteristic information is character information at a first appointed position in the target identification field, and the second characteristic information is character information at a second appointed position in the target identification field. Specifically, the first specified position may be a first three bits, and the second specified position may be a combination of a first two bits and a second two bits. As described in the above steps S52 and S53, the corresponding relation between the feature information and the noise reduction model is stored in the database, and according to the corresponding relation, the target noise reduction model corresponding to the first feature information can be determined, where the target noise reduction model is usually stored in the management terminal or the database, after the target noise reduction model is determined, a corresponding call instruction needs to be sent to call the target noise reduction model, and when the call is performed, the second feature information needs to be sent at the same time, so that after the management terminal receives the second feature information in the call instruction, the call request at this time can be marked; since the second feature information is information in the target identification field, the identification information corresponding to the classification result of the background sound information, that is, the background sound information may be associated with the call request, so as to facilitate subsequent inspection or data tracking. Further, it is also convenient to directly correlate the background sound information with the target noise reduction model.
In another embodiment, the step S5 of determining the corresponding target noise reduction model according to the target identification field includes:
step S501, matching a target model parameter set corresponding to the target identification field based on the corresponding relation between the identification field stored in the database and the model parameter set; the target model parameter set comprises a plurality of model parameters, wherein the model parameters at least comprise a smooth queue length and a smooth mechanism parameter; in this embodiment, different identification fields correspond to different model parameter sets, and according to the target identification fields, the corresponding target model parameter sets can be matched, where the target model parameter sets at least include a plurality of model parameters such as a smoothing queue length, a smoothing mechanism parameter, a threshold size, and a loss function.
Step S502, randomly calling a noise reduction model, and correspondingly replacing model parameters in the noise reduction model with model parameters in the target model parameter set to obtain the target noise reduction model. In this embodiment, the above-mentioned target model parameter set describes the model parameters corresponding to the target identification field, and the above-mentioned target identification field is associated with the current context, so that the above-mentioned target model parameter set is associated with the above-mentioned context, that is, different contexts can correspond to different target model parameter sets, so that the above-mentioned target noise reduction model is determined adaptively based on the different contexts in the above-mentioned voice data, and the defect that the current intelligent voice equipment cannot adapt to the context to perform noise reduction processing is overcome.
In an embodiment, the training process of the classification model includes:
acquiring an initial neural network model and training data; the training data are background sound training data and corresponding labels; the initial neural network model comprises a feature extraction layer, an encoding layer, a decoding layer and a classification layer; the neural network model comprises CNN and RNN models. The above-mentioned coding layer and decoding layer work is a reciprocal flow, the classifying layer adopts the loss function to classify the characteristic.
Performing feature extraction on the background sound training data based on the feature extraction layer to obtain a first sound feature; the feature extraction layer comprises a convolution layer, and the feature extraction layer extracts low-dimensional features in background sound training data;
inputting the first sound characteristic into the coding layer for coding to obtain a first coding characteristic; inputting the first coding feature into the decoding layer for decoding to obtain a second coding feature; the encoding layer converts the low-dimensional features into high-dimensional features (i.e., the first encoded features), and the decoding layer may reconvert the high-dimensional features into low-dimensional features. It should be noted that the coding layer is only concerned with the features of the first sound feature associated with the background when coding, whereas other human voices not associated with the background are not concerned by the coding layer. That is, the first sound feature includes a background sound feature and other sound features (noise), and the first coding feature includes a background sound feature; the second sound feature also includes only a background sound feature, and is purer than the first sound feature.
Inputting the second sound characteristic into the coding layer for coding to obtain a second coding characteristic;
and inputting the first coding feature, the second coding feature and the label corresponding to the background sound training data to the classification layer together for iterative training until the model converges to obtain the classification model.
In this embodiment, since the second sound feature is purer, the second sound feature is input to the coding layer to be coded, so that the second coding feature is relatively better, and then the first coding feature, the second coding feature and the label corresponding to the background sound training data are input to the classifying layer together to perform iterative training, so that the data volume of model training is improved, meanwhile, purer feature is adopted to perform training, and the effect obtained by final training of a proper model is better.
In yet another embodiment, the training process of the classification model includes:
acquiring a first neural network model and a second neural network model, and acquiring training data; the training data are background sound training data and corresponding labels; the first neural network model comprises a feature extraction layer, a coding layer, a decoding layer and a classification layer, and the second neural network model comprises a feature extraction layer, a coding layer and a classification layer; the neural network model comprises CNN and RNN models. The operation of the above-mentioned coding layer and decoding layer is a reciprocal flow.
Performing feature extraction on the background sound training data based on a feature extraction layer of the first neural network model to obtain a first sound feature; the feature extraction layer comprises a convolution layer, and the feature extraction layer extracts low-dimensional features in background sound training data;
inputting the first sound characteristic to a coding layer of the first neural network model for coding to obtain a first coding characteristic; inputting the first coding feature into the decoding layer for decoding to obtain a second coding feature; the encoding layer converts the low-dimensional features into high-dimensional features (i.e., the first encoded features), and the decoding layer may reconvert the high-dimensional features into low-dimensional features. It should be noted that the coding layer is only concerned with the features of the first sound feature associated with the background when coding, whereas other human voices not associated with the background are not concerned by the coding layer. That is, the first sound feature includes a background sound feature and other sound features (noise), and the first coding feature includes a background sound feature; the second sound feature also includes only a background sound feature, and is purer than the first sound feature.
Inputting the second sound characteristic to a coding layer of the second neural network model for coding to obtain a second coding characteristic;
inputting the first coding feature and the label corresponding to the background sound training data into a classification layer of the first neural network model for iterative training until the model converges to obtain a first classification model;
inputting the second coding feature and the label corresponding to the background sound training data into a classification layer of the second neural network model for iterative training until the model converges to obtain a second classification model; in this embodiment, since the second coding feature is purer, the second coding feature is input to the coding layer for coding, so that the second coding feature is relatively better, and further, the second coding feature is input to the classification layer of the second neural network model for iterative training, so that the training effect is better than the training by adopting the first coding feature.
Inputting the test set into the first classification model and the second classification model respectively for classification to obtain a first classification result and a second classification result;
judging whether the first classification result and the second classification result are the same as the labels in the test set; and if the first classification models are the same, taking the first classification model as the classification model.
In this embodiment, in order to test the validity of the first classification model and the second classification model, a test set is input into the first classification model and the second classification model to perform result prediction, and if the first classification result and the second classification result are the same as the labels in the test set, the confidence of the first classification model and the second classification model is higher. At this time, the first classification model may be regarded as the classification model. The reason why the first classification model is selected as the classification model in this embodiment is that, although the model training effect of the second classification model is better, the second classification model aims at pure background features; in actual processing, however, it is often difficult to have a clean background sound, and therefore, a first classification model is used as the classification model. The second classification model mainly serves to verify the confidence of the first classification model.
In an embodiment, a training manner of performing a keyword-speech recognition model based on the noise-reduced speech data is further provided, where the keyword-speech recognition model is used for performing keyword recognition on speech information.
Specifically, after step S6 of performing noise reduction processing on the voice data based on the target noise reduction model, the method includes:
acquiring noise-reduced voice data; after noise reduction, the voice data is relatively pure and generally only includes voice data of the user.
Performing character recognition on the noise-reduced voice data to obtain keywords; the voice data may include a lot of text content, and only some specific keywords are focused on in this embodiment.
Acquiring multi-frame voice data corresponding to the key words from the voice data; each keyword corresponds to a voice with a certain duration, and multi-frame voice data can be corresponding to the duration.
Inputting the keywords into a word embedding model, and extracting word vectors corresponding to the keywords; sequentially inputting the multi-frame voice data corresponding to the keywords into a preset neural network, and extracting vectors corresponding to each frame of voice data; in this embodiment, when training the neural network, not only the voice characteristics of the multi-frame voice data corresponding to the keywords need to be considered, but also the semantic characteristics of the keywords need to be considered; namely, not only the vector corresponding to each frame of voice data, but also the word vector corresponding to the keyword is required to be extracted.
Summing vectors corresponding to each frame of voice data to obtain a sum vector; since a plurality of frames of voice data correspond to one keyword, it is necessary to sum vectors corresponding to each frame of voice data to obtain a sum vector.
Adjusting network parameters of the preset neural network, and fitting the sum vector and the word vector through a cosine function to train the preset neural network; when the sum vector and the word vector are completely matched, a keyword voice recognition model is obtained; the keyword voice recognition model is used for carrying out keyword recognition on voice information. In this embodiment, when training the neural network, the network parameters of the neural network are continuously and iteratively adjusted, so that the sum vector and the word vector are fitted, that is, the approximation degree of the sum vector and the word vector reaches a threshold value, until the model converges, and the keyword voice recognition model is obtained. In the embodiment, the model training is performed by fully utilizing the voice data after the noise reduction processing, so that the multi-scene application of the data is expanded, and the difficulty of acquiring training data is reduced.
Referring to fig. 2, in an embodiment of the present invention, there is further provided an adaptive noise reduction apparatus, including:
a first acquisition unit configured to acquire voice data; wherein, the voice data at least carries background sound information;
the analysis unit is used for analyzing the voice data to obtain the background sound information;
the classification unit is used for inputting the background sound information into a classification model for classification to obtain a corresponding classification result; the classification model is a neural network model which is trained in advance;
the second acquisition unit is used for acquiring the identification information corresponding to the classification result;
a determining unit, configured to detect whether a target identification field identical to the identification information exists in a database; if yes, determining a corresponding target noise reduction model according to the target identification field;
and the noise reduction unit is used for carrying out noise reduction processing on the voice data based on the target noise reduction model.
In an embodiment, the determining unit comprises:
the analysis subunit is used for analyzing the target identification field to obtain first characteristic information and second characteristic information; the first characteristic information is character information at a first appointed position in the target identification field, and the second characteristic information is character information at a second appointed position in the target identification field;
the determining subunit is used for determining a target noise reduction model corresponding to the first characteristic information based on the corresponding relation between the characteristic information stored in the database and the noise reduction model;
a calling subunit, configured to send a calling instruction carrying the second feature information to a management terminal; the calling instruction is used for calling the determined target noise reduction model.
In an embodiment, the determining unit comprises:
the matching subunit is used for matching the target model parameter set corresponding to the target identification field based on the corresponding relation between the identification field stored in the database and the model parameter set; the target model parameter set comprises a plurality of model parameters, wherein the model parameters at least comprise a smooth queue length and a smooth mechanism parameter;
and the replacing subunit is used for randomly calling a noise reduction model, and correspondingly replacing model parameters in the noise reduction model with model parameters in the target model parameter set to obtain the target noise reduction model.
In this embodiment, for specific implementation of each unit and subunit in the embodiment of the foregoing apparatus, please refer to the description in the embodiment of the foregoing method, and no further description is given here.
Referring to fig. 3, in an embodiment of the present invention, there is further provided a computer device, which may be a server, and an internal structure thereof may be as shown in fig. 3. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the computer is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing noise reduction models and the like. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of adaptive noise reduction.
It will be appreciated by those skilled in the art that the architecture shown in fig. 3 is merely a block diagram of a portion of the architecture in connection with the present inventive arrangements and is not intended to limit the computer devices to which the present inventive arrangements are applicable.
An embodiment of the present invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of adaptive noise reduction. It is understood that the computer readable storage medium in this embodiment may be a volatile readable storage medium or a nonvolatile readable storage medium.
In summary, the method, the device and the computer device for adaptive noise reduction provided in the embodiments of the present invention include: acquiring voice data; wherein, the voice data at least carries background sound information; analyzing the voice data to obtain the background sound information; inputting the background sound information into a classification model for classification to obtain a corresponding classification result; the classification model is a neural network model which is trained in advance; acquiring identification information corresponding to the classification result; detecting whether a target identification field which is the same as the identification information exists in a database; if yes, determining a corresponding target noise reduction model according to the target identification field; and carrying out noise reduction processing on the voice data based on the target noise reduction model. The invention determines the corresponding target noise reduction model to perform noise reduction based on different backgrounds in the voice data, and overcomes the defect that the current intelligent voice equipment cannot adapt to the background to perform noise reduction.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium provided by the present invention and used in embodiments may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual speed data rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, apparatus, article or method that comprises the element.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the invention, and all equivalent structures or equivalent processes using the descriptions and drawings of the present invention or direct or indirect application in other related technical fields are included in the scope of the present invention.

Claims (10)

1. A method of adaptive noise reduction, comprising the steps of:
acquiring voice data; wherein, the voice data at least carries background sound information;
analyzing the voice data to obtain the background sound information;
inputting the background sound information into a classification model for classification to obtain a corresponding classification result; the classification model is a neural network model which is trained in advance;
acquiring identification information corresponding to the classification result;
detecting whether a target identification field which is the same as the identification information exists in a database; if yes, determining a corresponding target noise reduction model according to the target identification field;
and carrying out noise reduction processing on the voice data based on the target noise reduction model.
2. The method of adaptive noise reduction according to claim 1, wherein the step of determining a corresponding target noise reduction model from the target identification field comprises:
analyzing the target identification field to obtain first characteristic information and second characteristic information; the first characteristic information is character information at a first appointed position in the target identification field, and the second characteristic information is character information at a second appointed position in the target identification field;
determining a target noise reduction model corresponding to the first characteristic information based on the corresponding relation between the characteristic information stored in the database and the noise reduction model;
sending a calling instruction carrying the second characteristic information to a management terminal; the calling instruction is used for calling the determined target noise reduction model.
3. The method of adaptive noise reduction according to claim 1, wherein the step of determining a corresponding target noise reduction model from the target identification field comprises:
matching a target model parameter set corresponding to the target identification field based on the corresponding relation between the identification field stored in the database and the model parameter set; the target model parameter set comprises a plurality of model parameters, wherein the model parameters at least comprise a smooth queue length and a smooth mechanism parameter;
randomly calling a noise reduction model, and correspondingly replacing model parameters in the noise reduction model with model parameters in the target model parameter set to obtain the target noise reduction model.
4. The method of adaptive noise reduction according to claim 1, wherein the training process of the classification model comprises:
acquiring an initial neural network model and training data; the training data are background sound training data and corresponding labels; the initial neural network model comprises a feature extraction layer, an encoding layer, a decoding layer and a classification layer;
performing feature extraction on the background sound training data based on the feature extraction layer to obtain a first sound feature;
inputting the first sound characteristic into the coding layer for coding to obtain a first coding characteristic; inputting the first coding feature into the decoding layer for decoding to obtain a second coding feature;
inputting the second sound characteristic into the coding layer for coding to obtain a second coding characteristic;
and inputting the first coding feature, the second coding feature and the label corresponding to the background sound training data to the classification layer together for iterative training until the model converges to obtain the classification model.
5. The method of adaptive noise reduction according to claim 1, wherein the training process of the classification model comprises:
acquiring a first neural network model and a second neural network model, and acquiring training data; the training data are background sound training data and corresponding labels; the first neural network model comprises a feature extraction layer, a coding layer, a decoding layer and a classification layer, and the second neural network model comprises a feature extraction layer, a coding layer and a classification layer;
performing feature extraction on the background sound training data based on a feature extraction layer of the first neural network model to obtain a first sound feature;
inputting the first sound characteristic to a coding layer of the first neural network model for coding to obtain a first coding characteristic; inputting the first coding feature into the decoding layer for decoding to obtain a second coding feature;
inputting the second sound characteristic to a coding layer of the second neural network model for coding to obtain a second coding characteristic;
inputting the first coding feature and the label corresponding to the background sound training data into a classification layer of the first neural network model for iterative training until the model converges to obtain a first classification model;
inputting the second coding feature and the label corresponding to the background sound training data into a classification layer of the second neural network model for iterative training until the model converges to obtain a second classification model;
inputting the test set into the first classification model and the second classification model respectively for classification to obtain a first classification result and a second classification result;
judging whether the first classification result and the second classification result are the same as the labels in the test set; and if the first classification models are the same, taking the first classification model as the classification model.
6. The method of adaptive noise reduction according to claim 1, wherein after the step of noise reduction processing the speech data based on the target noise reduction model, comprising:
acquiring noise-reduced voice data;
performing character recognition on the noise-reduced voice data to obtain keywords;
acquiring multi-frame voice data corresponding to the key words from the voice data;
inputting the keywords into a word embedding model, and extracting word vectors corresponding to the keywords; sequentially inputting the multi-frame voice data corresponding to the keywords into a preset neural network, and extracting vectors corresponding to each frame of voice data;
summing vectors corresponding to each frame of voice data to obtain a sum vector;
adjusting network parameters of the preset neural network, and fitting the sum vector and the word vector through a cosine function to train the preset neural network; when the sum vector and the word vector are completely matched, a keyword voice recognition model is obtained; the keyword voice recognition model is used for carrying out keyword recognition on voice information.
7. An adaptive noise reduction apparatus, comprising:
a first acquisition unit configured to acquire voice data; wherein, the voice data at least carries background sound information;
the analysis unit is used for analyzing the voice data to obtain the background sound information;
the classification unit is used for inputting the background sound information into a classification model for classification to obtain a corresponding classification result; the classification model is a neural network model which is trained in advance;
the second acquisition unit is used for acquiring the identification information corresponding to the classification result;
a determining unit, configured to detect whether a target identification field identical to the identification information exists in a database; if yes, determining a corresponding target noise reduction model according to the target identification field;
and the noise reduction unit is used for carrying out noise reduction processing on the voice data based on the target noise reduction model.
8. The apparatus for adaptive noise reduction according to claim 7, wherein the determining unit includes:
the analysis subunit is used for analyzing the target identification field to obtain first characteristic information and second characteristic information; the first characteristic information is character information at a first appointed position in the target identification field, and the second characteristic information is character information at a second appointed position in the target identification field;
the determining subunit is used for determining a target noise reduction model corresponding to the first characteristic information based on the corresponding relation between the characteristic information stored in the database and the noise reduction model;
a calling subunit, configured to send a calling instruction carrying the second feature information to a management terminal; the calling instruction is used for calling the determined target noise reduction model.
9. The apparatus for adaptive noise reduction according to claim 7, wherein the determining unit includes:
the matching subunit is used for matching the target model parameter set corresponding to the target identification field based on the corresponding relation between the identification field stored in the database and the model parameter set; the target model parameter set comprises a plurality of model parameters, wherein the model parameters at least comprise a smooth queue length and a smooth mechanism parameter;
and the replacing subunit is used for randomly calling a noise reduction model, and correspondingly replacing model parameters in the noise reduction model with model parameters in the target model parameter set to obtain the target noise reduction model.
10. A computer device comprising a memory and a processor, the memory having stored therein a computer program, characterized in that the processor, when executing the computer program, carries out the steps of the method according to any one of claims 1 to 6.
CN202310877853.2A 2023-07-18 2023-07-18 Adaptive noise reduction method and device and computer equipment Active CN116597855B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310877853.2A CN116597855B (en) 2023-07-18 2023-07-18 Adaptive noise reduction method and device and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310877853.2A CN116597855B (en) 2023-07-18 2023-07-18 Adaptive noise reduction method and device and computer equipment

Publications (2)

Publication Number Publication Date
CN116597855A true CN116597855A (en) 2023-08-15
CN116597855B CN116597855B (en) 2023-09-29

Family

ID=87590345

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310877853.2A Active CN116597855B (en) 2023-07-18 2023-07-18 Adaptive noise reduction method and device and computer equipment

Country Status (1)

Country Link
CN (1) CN116597855B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107895011A (en) * 2017-11-03 2018-04-10 携程旅游网络技术(上海)有限公司 Processing method, system, storage medium and the electronic equipment of session information
CN111028841A (en) * 2020-03-10 2020-04-17 深圳市友杰智新科技有限公司 Method and device for awakening system to adjust parameters, computer equipment and storage medium
CN111223476A (en) * 2020-04-23 2020-06-02 深圳市友杰智新科技有限公司 Method and device for extracting voice feature vector, computer equipment and storage medium
CN113160844A (en) * 2021-04-27 2021-07-23 山东省计算中心(国家超级计算济南中心) Speech enhancement method and system based on noise background classification
CN113345460A (en) * 2021-08-05 2021-09-03 北京世纪好未来教育科技有限公司 Audio signal processing method, device, equipment and storage medium
CN114237937A (en) * 2021-12-17 2022-03-25 威创集团股份有限公司 Multithreading data transmission method and device
CN114373449A (en) * 2022-01-18 2022-04-19 海信电子科技(武汉)有限公司 Intelligent device, server and voice interaction method
CN114999525A (en) * 2022-02-28 2022-09-02 四川天中星航空科技有限公司 Light-weight environment voice recognition method based on neural network
KR102466061B1 (en) * 2021-07-02 2022-11-10 가천대학교 산학협력단 Apparatus for denoising using hierarchical generative adversarial network and method thereof
CN115881126A (en) * 2023-02-22 2023-03-31 广东浩博特科技股份有限公司 Switch control method and device based on voice recognition and switch equipment

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107895011A (en) * 2017-11-03 2018-04-10 携程旅游网络技术(上海)有限公司 Processing method, system, storage medium and the electronic equipment of session information
CN111028841A (en) * 2020-03-10 2020-04-17 深圳市友杰智新科技有限公司 Method and device for awakening system to adjust parameters, computer equipment and storage medium
CN111223476A (en) * 2020-04-23 2020-06-02 深圳市友杰智新科技有限公司 Method and device for extracting voice feature vector, computer equipment and storage medium
CN113160844A (en) * 2021-04-27 2021-07-23 山东省计算中心(国家超级计算济南中心) Speech enhancement method and system based on noise background classification
KR102466061B1 (en) * 2021-07-02 2022-11-10 가천대학교 산학협력단 Apparatus for denoising using hierarchical generative adversarial network and method thereof
CN113345460A (en) * 2021-08-05 2021-09-03 北京世纪好未来教育科技有限公司 Audio signal processing method, device, equipment and storage medium
CN114237937A (en) * 2021-12-17 2022-03-25 威创集团股份有限公司 Multithreading data transmission method and device
CN114373449A (en) * 2022-01-18 2022-04-19 海信电子科技(武汉)有限公司 Intelligent device, server and voice interaction method
CN114999525A (en) * 2022-02-28 2022-09-02 四川天中星航空科技有限公司 Light-weight environment voice recognition method based on neural network
CN115881126A (en) * 2023-02-22 2023-03-31 广东浩博特科技股份有限公司 Switch control method and device based on voice recognition and switch equipment

Also Published As

Publication number Publication date
CN116597855B (en) 2023-09-29

Similar Documents

Publication Publication Date Title
CN110472224B (en) Quality of service detection method, apparatus, computer device and storage medium
CN111028827A (en) Interaction processing method, device, equipment and storage medium based on emotion recognition
CN110444198B (en) Retrieval method, retrieval device, computer equipment and storage medium
CN110930989B (en) Speech intention recognition method and device, computer equipment and storage medium
CN111191032B (en) Corpus expansion method, corpus expansion device, computer equipment and storage medium
CN111223476B (en) Method and device for extracting voice feature vector, computer equipment and storage medium
CN112037799A (en) Voice interrupt processing method and device, computer equipment and storage medium
CN114120978A (en) Emotion recognition model training and voice interaction method, device, equipment and medium
CN113506575B (en) Processing method and device for streaming voice recognition and computer equipment
CN113571096B (en) Speech emotion classification model training method and device, computer equipment and medium
CN114639175A (en) Method, device, equipment and storage medium for predicting examination cheating behaviors
CN113569021B (en) Method for classifying users, computer device and readable storage medium
CN116597855B (en) Adaptive noise reduction method and device and computer equipment
CN116110112B (en) Self-adaptive adjustment method and device of intelligent switch based on face recognition
CN115881126B (en) Switch control method and device based on voice recognition and switch equipment
CN112579751A (en) User information filling method and device and computer equipment
CN115497484B (en) Voice decoding result processing method, device, equipment and storage medium
CN113792166B (en) Information acquisition method and device, electronic equipment and storage medium
CN113111855B (en) Multi-mode emotion recognition method and device, electronic equipment and storage medium
CN111883109B (en) Voice information processing and verification model training method, device, equipment and medium
CN113113001A (en) Human voice activation detection method and device, computer equipment and storage medium
CN114780757A (en) Short media label extraction method and device, computer equipment and storage medium
CN114723986A (en) Text image matching method, device, equipment and storage medium
CN112766052A (en) CTC-based image character recognition method and device
CN112669836A (en) Command recognition method and device and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant