CN112530453B - Voice recognition method and device suitable for noise environment - Google Patents

Voice recognition method and device suitable for noise environment Download PDF

Info

Publication number
CN112530453B
CN112530453B CN202011355810.0A CN202011355810A CN112530453B CN 112530453 B CN112530453 B CN 112530453B CN 202011355810 A CN202011355810 A CN 202011355810A CN 112530453 B CN112530453 B CN 112530453B
Authority
CN
China
Prior art keywords
voice
noisy
speech
dialogue
voice recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011355810.0A
Other languages
Chinese (zh)
Other versions
CN112530453A (en
Inventor
余翠琳
周文略
陈家聪
梁艳阳
王天雷
冯伟霞
秦传波
翟懿奎
朱翠娥
刘始匡
黎繁胜
蒋润锦
张俊亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhixiang Technology Jiangmen Co ltd
Wuyi University
Original Assignee
Zhixiang Technology Jiangmen Co ltd
Wuyi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhixiang Technology Jiangmen Co ltd, Wuyi University filed Critical Zhixiang Technology Jiangmen Co ltd
Priority to CN202011355810.0A priority Critical patent/CN112530453B/en
Publication of CN112530453A publication Critical patent/CN112530453A/en
Application granted granted Critical
Publication of CN112530453B publication Critical patent/CN112530453B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Abstract

The invention provides a voice recognition method suitable for a noise environment, which comprises the following steps: receiving to-be-processed noisy conversation voice uploaded by voice acquisition equipment; carrying out voice enhancement processing on the dialogue voice with noise to be processed so as to extract voice characteristics in the dialogue voice with noise to be processed; searching a target voice recognition parameter set value of the voice with noise to be processed from the voice recognition parameter set according to the voice characteristics; and sending the target voice recognition parameter set value to the voice acquisition equipment so that the voice acquisition equipment performs voice recognition on the received voice data according to the target voice recognition parameter set value. The implementation of the invention can improve the input of voice signals through the microphone in various noisy environments and realize high-precision automatic voice recognition.

Description

Voice recognition method and device suitable for noise environment
Technical Field
The present invention relates to the field of speech recognition technologies, and in particular, to a speech recognition method and apparatus suitable for use in a noise environment.
Background
Since the 21 st century, the voice recognition research of China is rapidly developed, a considerable part of excellent enterprises emerge, the world leading level is reached in some fields, and products with high market share are produced, for example, translation machines of the science news carrier are widely applied to the people who travel abroad. Currently, acoustic models based on deep neural networks have significantly improved the performance of speech recognition, especially under near-field conditions. However, in practical applications, far-field and reverberant speech recognition remains a challenging problem.
Robust speech recognition is a common concern in the field of signal processing and speech recognition in a practical application environment, which is one of the most challenging tasks in the last decades. One major reason is that the target speech is contaminated with various background noises. At present, consumer electronics equipped with microphone arrays (e.g., car navigation devices and headsets) typically employ gradient method based speech enhancement techniques to cope with additive noise. However, while these techniques were originally developed for voice communications and can maximize signal-to-noise ratio (SDR), they do not always maximize Automatic Speech Recognition (ASR) accuracy.
Therefore, it is desirable to provide a method that maximizes the accuracy of a given automatic speech recognition by automatically adjusting the front-end speech enhancement function. The method can input voice through the microphone of the helmet when the environmental noise changes, high-precision automatic voice recognition is achieved through the algorithm, and command issuing of various functions of the helmet is completed. Genetic Algorithms (GA) are used to generate parameter values for front-end speech enhancement for a particular environment. By clustering the environments in advance based on the noise characteristics, the generated values can be dynamically assigned to the input speech signal.
Disclosure of Invention
In view of the foregoing problems in the prior art, an object of the present invention is to provide a speech recognition method suitable for use in a noise environment, including:
receiving to-be-processed noisy conversation voice uploaded by voice acquisition equipment;
carrying out voice enhancement processing on the to-be-processed noisy dialogue voice so as to extract voice characteristics in the to-be-processed noisy dialogue voice;
searching a target voice recognition parameter set value of the to-be-processed noisy dialogue voice from a voice recognition parameter set according to the voice characteristics;
and sending the target voice recognition parameter set value to the voice acquisition equipment so that the voice acquisition equipment performs voice recognition on the received voice data according to the target voice recognition parameter set value.
Further, the method for constructing the speech recognition parameter value set comprises the following steps:
acquiring a plurality of groups of noisy conversational speech under different noise environments, a speech recognition parameter value set and sentence texts corresponding to the noisy conversational speech;
clustering the multiple groups of noisy dialogue voices in different noise environments, and distributing an initial voice recognition parameter value for each type of noisy dialogue voice from the voice recognition parameter value set;
performing voice recognition on each type of the noisy dialogue voice by using the initial voice recognition parameter value, comparing a recognition result with sentence texts corresponding to the noisy dialogue voice, and adjusting each initial parameter candidate value according to the comparison result;
and obtaining the voice recognition parameter value set according to the adjusted initial parameter candidate value.
Further, the performing voice enhancement processing on the to-be-processed noisy conversational voice includes:
and carrying out voice enhancement processing on the to-be-processed noisy dialogue voice by utilizing a multi-channel wiener filter.
Further, the noisy conversational speech includes: coherent and incoherent sound sources;
the voice enhancement processing is carried out on the to-be-processed noisy dialogue voice by utilizing the multi-channel wiener filter, and the method comprises the following steps:
determining a transfer function of the coherent sound source and a transfer function of the incoherent sound source according to the noisy conversational speech;
processing the transfer function of the coherent sound source and the transfer function of the incoherent sound source by using a beam former to obtain the power spectral density of the coherent sound source and the power spectral density of the residual noise;
processing the power spectral density of the coherent sound source and the power spectral density of the residual noise by using a wiener post-filter to obtain the power spectral density of background noise;
and acquiring corresponding voice characteristics according to the power spectral density of the background noise.
In another aspect, the present invention provides a speech recognition method suitable for use in a noise environment, including:
collecting a to-be-processed noisy conversation voice;
uploading the to-be-processed noisy conversation voice to a terminal server so that the terminal server feeds back a target voice recognition parameter set value based on the noisy conversation voice;
and carrying out voice recognition on the noisy dialogue voice based on the received target voice recognition parameter set value.
In another aspect, the present invention provides a speech recognition apparatus suitable for use in a noisy environment, comprising:
the receiving module is configured to execute receiving of the to-be-processed noisy conversation voice uploaded by the voice acquisition equipment;
a voice feature extraction module configured to perform voice enhancement processing on the to-be-processed noisy speech so as to extract voice features in the to-be-processed noisy speech;
a searching module configured to perform searching for a target speech recognition parameter setting value of the to-be-processed noisy conversational speech from a speech recognition parameter value set according to the speech feature;
and the sending module is configured to execute sending the target voice recognition parameter set value to the voice acquisition equipment so that the voice acquisition equipment performs voice recognition on the received voice data according to the target voice recognition parameter set value.
Further, the search module comprises a parameter set setting module; the parameter set setting module includes:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is configured to acquire noisy dialogue voices, a voice recognition parameter value set and sentence texts corresponding to the noisy dialogue voices in multiple groups of different noise environments;
a clustering unit configured to perform clustering on the multiple groups of noisy conversational speech in different noise environments, and assign an initial speech recognition parameter value to each type of noisy conversational speech from the speech recognition parameter value set;
the adjusting unit is configured to perform voice recognition on each type of the noisy dialogue voice by using the initial voice recognition parameter value, compare a recognition result with a sentence text corresponding to the noisy dialogue voice, and adjust each initial parameter candidate value according to the comparison result;
and the parameter set acquisition unit is configured to acquire the voice recognition parameter value set according to the adjusted initial parameter candidate value.
In another aspect, the present invention provides a speech recognition apparatus suitable for use in a noisy environment, comprising:
the acquisition module is configured to acquire the to-be-processed noisy conversational speech;
an uploading module configured to upload the to-be-processed noisy conversation voice to a terminal server so that the terminal server feeds back a target voice recognition parameter setting value based on the noisy conversation voice;
a speech recognition module configured to perform speech recognition on the noisy conversational speech based on the received target speech recognition parameter setting value.
In another aspect, the present invention provides a computer-readable storage medium, in which at least one instruction or at least one program is stored, and the at least one instruction or the at least one program is loaded and executed by a processor to implement a speech recognition method suitable for use in a noise environment as described in any of the above.
In another aspect, the present invention provides a speech recognition device adapted for use in noisy environments, comprising at least one processor, and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, and the at least one processor implements a method of speech recognition suitable for use in noisy environments as described above by executing the instructions stored by the memory.
The voice recognition method and the voice recognition device which are suitable for the noise environment have the following beneficial effects:
the embodiments of the present description provide high performance automatic speech recognition for the dispenser by applying automatic parameter setting of front-end speech enhancement functions to the helmet's microphone capture signal. Since the delivery environment of the delivery person may have various noises, adjusting the parameter set according to the noise condition will improve the automatic speech recognition accuracy. Parameter setting value search and automatic speech recognition are run on the terminal server computer. If the accuracy of automatic speech recognition is low, their values can be searched again and updated to improve the quality of service. After the optimal parameter setting value is found, the optimal parameter setting value can be transmitted to an automatic voice recognition system of the helmet through the added port to realize real-time intelligent recognition. The parameter set selection is combined with the front-end voice enhancement function and is operated simultaneously with the voice enhancement function on the helmet, so that a distributor can realize voice recognition command control through a microphone.
Drawings
In order to more clearly illustrate the technical solution of the present invention, the drawings used in the description of the embodiment or the prior art will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.
FIG. 1 is a schematic diagram of an implementation environment provided by an embodiment of the invention;
FIG. 2 is a schematic diagram of a multi-channel wiener filter according to an embodiment of the present invention;
fig. 3 is a schematic flowchart of a first speech recognition method suitable for use in a noise environment according to an embodiment of the present invention;
fig. 4 is a flowchart illustrating a fourth speech recognition method suitable for use in a noise environment according to an embodiment of the present invention;
fig. 5 is a schematic flowchart of a fifth speech recognition method suitable for use in a noise environment according to an embodiment of the present invention;
fig. 6 is a flowchart illustrating a sixth speech recognition method suitable for use in a noise environment according to an embodiment of the present invention;
fig. 7 is a flowchart illustrating a seventh speech recognition method suitable for use in a noise environment according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of a speech recognition apparatus suitable for use in a noise environment according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of another speech recognition apparatus suitable for use in a noise environment according to an embodiment of the present invention;
fig. 10 is a schematic structural diagram of a speech recognition apparatus suitable for use in a noise environment according to an embodiment of the present invention.
The voice recognition system comprises a receiving module 610, a voice feature extraction module 620, a searching module 630, a sending module 640, a collecting module 810, an uploading module 820 and a voice recognition module 830.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, apparatus, article, or device that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or device. In order to facilitate understanding of the technical solutions and the technical effects thereof described in the embodiments of the present specification, the embodiments of the present specification first explain related terms:
the frequency bin is the number following the decimal point, such as 810 in "72.810". This is achieved by using a dedicated "crystal oscillator" associated with the remote control device. The purpose of this function is to make many people use the remote control equipment simultaneously to distinguish and use different frequency points, and not to interfere with each other.
Power spectral density: in physics, signals are typically in the form of waves, such as electromagnetic waves, random vibrations, or acoustic waves. The power carried by a wave per unit frequency is obtained when the spectral density of the wave is multiplied by an appropriate coefficient, which is called the Power Spectral Density (PSD) or Spectral Power Distribution (SPD) of the signal. The unit of power spectral density is usually expressed in watts per hertz (W/Hz), or in wavelengths rather than frequencies, i.e. watts per nanometer (W/nm).
Referring to fig. 1 in the specification, a schematic diagram of an implementation environment provided by an embodiment of the present invention is shown, and as shown in fig. 1, the implementation environment may include at least a terminal server 110 and a voice collecting device 120.
It is understood that the voice capturing device 120 may communicate with the terminal server 110 in real time.
The terminal server 110 may be one or more smart phones, tablet computers, e-book readers, laptop portable computers, desktop computers, and the like. The voice capture device 120 may have a client installed therein, where the client may be an application program provided by the service provider to the user, or may be a web page provided by the service provider to the user. The speech acquisition device 120 may include a network communication unit, a processor, and memory, among others. The voice collecting device 120 can establish a communication connection with the terminal server 110 through a wireless or wired network. The wired connection mode can be a USB (universal serial bus), a serial port connection mode or a 422 interface connection mode, and the wireless connection mode can be a wireless local area network, a Bluetooth and/or a near field communication mode, and the like.
In the embodiment of the present invention, the client may be any client providing a service for a user. For example, the client may be a star observation type client, a payment type application client, a recruitment type client, a shopping type client, and the like.
An embodiment of the present specification provides a speech recognition method applicable to a noise environment, fig. 3 is a flowchart illustrating a first speech recognition method applicable to a noise environment according to an embodiment of the present invention, and as shown in fig. 3, an execution subject of the method may be a terminal server, and the method includes:
s102, receiving the to-be-processed noisy dialogue voice uploaded by the voice acquisition equipment.
In a specific implementation process, the voice acquisition device may be a device for acquiring a voice of a user, and as an application scenario of the embodiment of the present specification may be a microphone with voice acquisition worn by a takeaway.
And S104, performing voice enhancement processing on the to-be-processed noisy conversational voice to extract voice features in the to-be-processed noisy conversational voice.
In a specific implementation process, the speech enhancement processing performed on the to-be-processed noisy speech may be represented as noise reduction processing, and noise in the to-be-processed noisy speech is filtered to obtain a coherent sound source of the user.
And S106, searching the target voice recognition parameter set value of the to-be-processed noisy dialogue voice from the voice recognition parameter set according to the voice characteristics.
In a specific implementation process, fig. 4 is a schematic flow chart of a fourth speech recognition method applicable to a noise environment according to an embodiment of the present invention, and as shown in fig. 4, the method for constructing the speech recognition parameter value set includes:
s202, acquiring a plurality of groups of noisy dialogue voices, a voice recognition parameter value set and sentence texts corresponding to the noisy dialogue voices in different noise environments;
s204, clustering the multiple groups of noisy conversational voices in different noise environments, and distributing an initial voice recognition parameter value for each type of noisy conversational voice from the voice recognition parameter value set;
s206, performing voice recognition on each type of noisy dialogue voice by using the initial voice recognition parameter value, comparing a recognition result with sentence texts corresponding to the noisy dialogue voices, and adjusting each initial parameter candidate value according to the comparison result;
and S208, obtaining the voice recognition parameter value set according to the adjusted initial parameter candidate value.
In a specific implementation, the method for constructing the speech recognition parameter value set may be as follows, wherein the xi in (1) is used for the speech recognition parameter value set (parameter set)iDenotes, whereiniIs an index that distinguishes different values, ξ represents an element of the parameter set, and Nele is the number of such elements.
Figure BDA0002802587130000071
Assuming that the noise conditions are classified into N classes, the initial speech recognition parameter values will be initially adjusted as follows:
step 1) generating candidate initial speech recognition parameter values.
And 2) clustering the voice recognition parameter value set into subsets, and distributing initial voice recognition parameter values for the clusters.
The speech recognition parameter value set (data set) is composed of a speech signal captured by a microphone and correct sentence text. When an automatic speech recognition system of a speech acquisition device (helmet) requests to perform speech recognition, the best parameter setting value among the candidates will be automatically selected by measuring the distance between the noise and the cluster.
The speech recognition parameter value set is adjusted by clustering groups of noisy conversational speech in different noise environments and assigning an initial speech recognition parameter value to each cluster. Groups of noisy conversational speech in different noise environments are clustered in a feature space. By combining a filter bank GFBApplication to
Figure BDA0002802587130000081
And using the time average, e.g. (2), to obtain
Figure BDA0002802587130000082
The characteristics of (1) wherein T, ΩDFTAnd ΩFBRespectively representing the frame number, the frame length and the filter bank channel number.
Figure BDA0002802587130000083
The centroid is determined by a local search to maximize the accuracy of the automatic speech recognition computed by all clusters.
Searching the target speech recognition parameter set value of the to-be-processed noisy conversational speech from the speech recognition parameter set according to the speech characteristics may be performed by a genetic algorithm, specifically,
an arrangement for generating speech recognition parameter value set candidates (data sets) is shown in fig. 1. Preparing candidate parameter setting values { xi ] of the corresponding number L of the noisy conversational speech under different noise environmentsil}il∈LWhere L represents a set of indices of size L. Searching for the mth initial parameter candidate may be represented as (3)
Figure BDA0002802587130000084
Here, J1 is an objective function corresponding to the/th subset of the data set, and it should be positively correlated with the automatic speech recognition accuracy. Each subset includes sound signals observed in various environments because the entire data set including the various environments is clustered. Therefore, the parameter setting value using the (3) search will be exclusively used for various environments.
Whereas the genetic algorithm is used to derive the suboptimal value xi for each L set of noise conditions. Genetic algorithm is one of the most commonly used meta-heuristic algorithms, which can solve the multimodal optimization problem, since it combines global and local searches. The real number encoding genetic algorithm is used because the elements xi are consecutive real numbers.
The objective function is called fitness in genetic algorithms. Parameter setting values for improving the accuracy of automatic speech recognition are searched. Therefore, fitness is a function of the character error rate of automatic speech recognition, using J in (4)CER,lAnd (4) showing.
Figure BDA0002802587130000085
Here, power law scaling is employed, where igen denotes any real number. In the first stage of the generation number, the signal-to-distortion ratio is used as the fitness. In this case, the fitness is set to (5), and the data set must include a non-interfering signal as a reference for calculating the signal-to-distortion ratio in addition to the speech signal and the correct sentence text.
Figure BDA0002802587130000091
At the beginning of the search, the maximum values (xi) of the possible parameter settings are givenmax) And a minimum value (xi)min) And M initial parameter settings. To generate M offspring in one generation, M "pairs" of individuals are selected from the U existing individuals as parents. The possible logarithm is equal to the number of 2 combinations from the U elementUC2And the possible number of M parents is equal to (A)UC2)MSince the "pair" is selected repeatedly. The random selection algorithm is used to select individuals with higher fitness according to the closest best principle with higher probability, and the probability of selecting the parameter setting value i is determined by (6).
Figure BDA0002802587130000092
When the generation number transitions from the stage of using (5) to the stage of using (4), the manner of selection is different from random selection. In this case, the person in each generation that produces the highest signal-to-distortion ratio will be selected. Selecting individuals that are generated earlier can preserve diversity better than random selection because parameter settings that result in high signal-to-distortion ratios do not necessarily guarantee high automatic speech recognition accuracy.
BLX-a incorporates both crossover and mutation and is used to generate progeny. Selecting two parent individuals using random selection; their n-th elements are respectively represented as xii,nAnd xij,nAt intervals [ a, b]Wherein a and b are given by (7) and (8), and α is a coefficient defined in BLX-a to extend the interval.
a=minξi(,n,ξj,n)-α{maxξi(,n,ξj,n)-minξi(,n,ξj,n)} (7)
b=max(ξi,n,ξj,n)+α{max(ξi,n,ξj,n)-min(ξi,n,ξj,n)} (8)
Each descendant xik,nAre sampled from the truncated normal distribution (9) and not from the uniform distribution in the original BLX-a (10).
ξk,n~N(ξi,n,σ2,a,b) (9)
ξk,n~U(a,b) (10)
Interval [ a, b]Is expressed as U (a, b) and N (μ, σ) respectively2A, b), where μ denotes the mean and σ 2 denotes the variance, as in (11).
σ2=β{max(ξi,n,ξj,n)-min(ξi,n,ξj,n)} (11)
β in (11) is set so as to decrease as the number of generations increases. Thus, the search process tends to transition from global to local.
And S108, sending the target voice recognition parameter set value to the voice acquisition equipment so that the voice acquisition equipment performs voice recognition on the received voice data according to the target voice recognition parameter set value.
In a specific implementation process, after a target voice recognition parameter set value is determined at a server end, the target voice recognition parameter set value can be sent to corresponding voice acquisition equipment, so that the voice acquisition equipment performs voice recognition on received voice data according to the target voice recognition parameter set value.
The invention provides a novel automatic voice recognition system, which aims to improve the condition that voice signals are input through a microphone of a takeaway intelligent helmet in various noisy environments and realize high-precision automatic voice recognition. In the system, the parameter settings for front-end speech enhancement are adjusted by algorithmic optimization rather than empirically as in the prior conventional methods. Appropriate parameter setting values are generated in advance for each noise environment, and the optimum value thereof is automatically selected in accordance with the noise environment. The real number encoding genetic algorithm is used to search for parameter settings that maximize the accuracy of automatic speech recognition. The following algorithm is designed, so that the efficiency of the searching process is improved: 1) in earlier generations, fitness was set as a function of SDR; 2) the truncated normal distribution is used to generate offspring.
On the basis of the foregoing embodiment, in an embodiment of this specification, fig. 5 is a flowchart illustrating a fifth speech recognition method applied in a noise environment according to an embodiment of the present invention, and as shown in fig. 5, the performing speech enhancement processing on the to-be-processed noisy speech includes:
s302, carrying out voice enhancement processing on the to-be-processed noisy dialogue voice by utilizing a multi-channel wiener filter.
In a specific implementation process, a multi-channel wiener filter can be used for performing voice enhancement processing on the to-be-processed noisy conversational voice, and filtering an incoherent sound source in the noisy conversational voice to obtain a coherent sound source of a user.
On the basis of the foregoing embodiment, in an embodiment of this specification, fig. 6 is a flowchart illustrating a sixth speech recognition method applicable to a noisy environment according to an embodiment of the present invention, and as shown in fig. 6, the noisy speech includes: coherent and incoherent sound sources;
the voice enhancement processing is carried out on the to-be-processed noisy dialogue voice by utilizing the multi-channel wiener filter, and the method comprises the following steps:
s402, determining a transfer function of the coherent sound source and a transfer function of the incoherent sound source according to the noisy conversational speech;
s404, processing the transfer function of the coherent sound source and the transfer function of the incoherent sound source by using a beam former to obtain the power spectral density of the coherent sound source and the power spectral density of the residual noise;
s406, processing the power spectral density of the coherent sound source and the power spectral density of the residual noise by using a wiener post-filter to obtain the power spectral density of background noise;
s408, acquiring corresponding voice characteristics according to the power spectral density of the background noise.
In a specific implementation process, the multi-channel wiener filter is the most important block, and the configuration of the multi-channel wiener filter is shown in fig. 2, where fig. 2 is a schematic diagram of a multi-channel wiener filter according to an embodiment of the present invention.
Let us assume that the sound of the dispenser arrives at the microphone from a known direction, wherein a coherent sound source is defined as (12), the first element of which is the user's voice. Wherein Y1 and Y each represent Y1(ω, τ) and y (ω, τ).
s(ω,τ)=[S1(ω,τ)S2(ω,τ)...SQ(ω,τ)]T (12)
The superscript T denotes transposition, ω and τ denote frequency bins and time ranges, respectively.
The signals received by the microphone are denoted by (13) and (14), where hq(w) and w (ω, τ) are the transfer functions from the q-th coherent sound source to the microphone and incoherent background noise, respectively.
x(ω,τ)=H(ω)s(ω,τ)+w(ω,τ) (13)
H(ω)=[h1(ω)h2(ω)…hQ(ω)] (14)
Minimum variance distortionless response beamformer GBF(ω)For generating Y1 direction signals, as shown in (15) - (17), wherein
Figure BDA0002802587130000113
Is the background noise in the beamformer output.
Figure BDA0002802587130000111
GBF(ω)=[gBF1(ω)gBF2(ω)…gBFR(ω)] (16)
Figure BDA0002802587130000112
Superscript H denotes hermitian transpose. If Y (ω, τ) is the first element, i.e. the target direction Y1Signal at (ω, τ), Y1(ω, τ) is regarded as the target component S1The sum of (ω, τ) and the residual noise component V (ω, τ) is, in (18) and (19), under the constraint of a minimum variance distortionless response beamformer (20).
Y1(ω,τ)=S1(ω,τ)+V(ω,τ) (18)
Figure BDA0002802587130000121
Figure BDA0002802587130000122
The power spectral density of the coherent sound source S (ω, τ) is described by (21), while the power spectral density of the residual noise V (ω, τ) is described by φVAnd (omega, tau) is shown.
Figure BDA0002802587130000123
In order to reduce the residual noise component V (ω, τ), wiener post-filter G is appliedWiener(ω, τ) is applied to Y1(ω, τ) is shown in (22) and (23).
Z(ω,τ)=GWinner(ω,τ)Y1(ω,τ) (22)
Figure BDA0002802587130000124
In beam forming device
Figure BDA0002802587130000125
Is given by (24).
Figure BDA0002802587130000126
The relationship between the power spectral density of the acoustic source and the power spectral density of the beamformer output is established by a linear equationMode, as shown in (25), where D (w) is the directional gain of the beamformer, and G is calculated using the beamformer gainBF(ω) and a transfer function H (ω).
Figure BDA0002802587130000127
The power spectral density of the background noise w (ω, τ) is approximately (26), where the superscript "+" represents the pseudo-inverse.
Figure BDA0002802587130000128
By using (27), (25) is deformed to about (28).
Figure BDA0002802587130000129
φS+W(ω,τ)=D+(ω)φY(ω,τ) (28)
By utilizing (28), one can estimate in (3) using (9) and (10), respectively
Figure BDA0002802587130000131
And phiV(ω, τ), where γ is a weighting coefficient.
Figure BDA0002802587130000132
Figure BDA0002802587130000133
The power spectral density of the background noise is estimated using the minimum statistics to be (31).
Figure BDA0002802587130000134
Here, τ int is a time interval, and superscript denotes an exponential moving average for smoothing. In practical applications, the wiener post-filter is reshaped by a smoothing process.
On the other hand, an embodiment of the present specification provides a speech recognition method suitable for a noise environment, fig. 7 is a flowchart illustrating a seventh speech recognition method suitable for a noise environment according to an embodiment of the present invention, as shown in fig. 7, an execution subject of an embodiment of the present specification is a speech acquisition device, including:
and S502, collecting the dialogue voice with noise to be processed.
In a specific implementation process, the voice collecting device may collect the to-be-processed noisy conversational voice, which may include all sound sources corresponding to the collecting time, wherein the to-be-processed noisy conversational voice may include: coherent sound sources and incoherent sound sources. Coherent sound sources may be characterized as speech input by a user and incoherent sound sources may be characterized as background noise.
S504, uploading the to-be-processed noisy conversation voice to a terminal server, so that the terminal server feeds back a target voice recognition parameter set value based on the noisy conversation voice.
In a specific implementation process, the voice collecting device may be configured with a communication device for sending the collected to-be-processed noisy conversational voice to the terminal server.
S506, performing voice recognition on the noisy dialogue voice based on the received target voice recognition parameter set value.
In a specific implementation process, the speech recognition is a pattern recognition, and includes three basic units, such as feature extraction, pattern matching, and reference to a pattern library.
On the other hand, an embodiment of the present disclosure provides a speech recognition apparatus suitable for use in a noise environment, and fig. 8 is a schematic structural diagram of the speech recognition apparatus suitable for use in a noise environment according to an embodiment of the present disclosure, as shown in fig. 8, including:
the receiving module 610 is configured to perform receiving of the to-be-processed noisy conversational speech uploaded by the speech acquisition device;
a speech feature extraction module 620 configured to perform speech enhancement processing on the to-be-processed noisy conversational speech to extract speech features in the to-be-processed noisy conversational speech;
a searching module 630 configured to perform searching for a target speech recognition parameter setting value of the to-be-processed noisy conversational speech from a speech recognition parameter value set according to the speech feature;
a sending module 640 configured to execute sending the target voice recognition parameter setting value to the voice collecting device, so that the voice collecting device performs voice recognition on the received voice data according to the target voice recognition parameter setting value.
On the basis of the above embodiments, in an embodiment of the present specification, the search module includes a parameter set setting module; the parameter set setting module includes:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is configured to acquire noisy dialogue voices, a voice recognition parameter value set and sentence texts corresponding to the noisy dialogue voices in multiple groups of different noise environments;
a clustering unit configured to perform clustering on the multiple groups of noisy conversational speech in different noise environments, and assign an initial speech recognition parameter value to each type of noisy conversational speech from the speech recognition parameter value set;
the adjusting unit is configured to perform voice recognition on each type of the noisy dialogue voice by using the initial voice recognition parameter value, compare a recognition result with a sentence text corresponding to the noisy dialogue voice, and adjust each initial parameter candidate value according to the comparison result;
and the parameter set acquisition unit is configured to acquire the voice recognition parameter value set according to the adjusted initial parameter candidate value.
On the other hand, an embodiment of the present disclosure provides a speech recognition apparatus suitable for use in a noise environment, and fig. 9 is a schematic structural diagram of another speech recognition apparatus suitable for use in a noise environment according to an embodiment of the present disclosure, as shown in fig. 9, including:
an acquisition module 810 configured to perform acquisition of a to-be-processed noisy conversational speech;
an uploading module 820 configured to perform uploading the to-be-processed noisy conversation voice to a terminal server, so that the terminal server feeds back a target voice recognition parameter setting value based on the noisy conversation voice;
a speech recognition module 830 configured to perform speech recognition on the noisy conversational speech based on the received target speech recognition parameter setting value.
In another aspect, the present specification provides a computer-readable storage medium, in which at least one instruction or at least one program is stored, and the at least one instruction or the at least one program is loaded and executed by a processor to implement the method for speech recognition in a noise environment.
On the other hand, an embodiment of the present disclosure provides a speech recognition device suitable for use in a noisy environment, fig. 10 is a schematic structural diagram of a disk failure replacement device of a distributed storage system according to an embodiment of the present disclosure, as shown in fig. 10, including at least one processor and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, and the at least one processor implements a method of speech recognition suitable for use in noisy environments as described above by executing the instructions stored by the memory.
Since the speech recognition apparatus, the computer-readable storage medium, and the speech recognition device suitable for use in a noise environment have the same technical effects as the speech recognition method suitable for use in a noise environment, they are not described in detail herein.
It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. The implementation principle and the generated technical effect of the testing method provided by the embodiment of the invention are the same as those of the system embodiment, and for the sake of brief description, the corresponding contents in the system embodiment can be referred to where the method embodiment is not mentioned.
In the several embodiments provided in the present application, it should be understood that the disclosed system and method may be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The above functions, if implemented in the form of software functional units and sold or used as a separate product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the above claims.

Claims (8)

1. A method for speech recognition in a noisy environment, comprising:
receiving to-be-processed noisy conversation voice uploaded by voice acquisition equipment;
carrying out voice enhancement processing on the to-be-processed noisy dialogue voice so as to extract voice characteristics in the to-be-processed noisy dialogue voice;
searching a target voice recognition parameter set value of the to-be-processed noisy dialogue voice from a voice recognition parameter set according to the voice characteristics;
sending the target voice recognition parameter set value to the voice acquisition equipment so that the voice acquisition equipment performs voice recognition on the received voice data according to the target voice recognition parameter set value;
the method for constructing the speech recognition parameter value set comprises the following steps:
acquiring a plurality of groups of noisy conversational speech under different noise environments, a speech recognition parameter value set and sentence texts corresponding to the noisy conversational speech;
clustering the multiple groups of noisy dialogue voices in different noise environments, and distributing an initial voice recognition parameter value for each type of noisy dialogue voice from the voice recognition parameter value set;
performing voice recognition on each type of the noisy dialogue voice by using the initial voice recognition parameter value, comparing a recognition result with sentence texts corresponding to the noisy dialogue voice, and adjusting each initial parameter candidate value according to the comparison result;
and obtaining the voice recognition parameter value set according to the adjusted initial parameter candidate value.
2. The method of claim 1, wherein the performing speech enhancement processing on the noisy conversational speech to be processed comprises:
and carrying out voice enhancement processing on the to-be-processed noisy dialogue voice by utilizing a multi-channel wiener filter.
3. The method of claim 2, wherein the noisy conversational speech comprises: coherent and incoherent sound sources;
the voice enhancement processing is carried out on the to-be-processed noisy dialogue voice by utilizing the multi-channel wiener filter, and the method comprises the following steps:
determining a transfer function of the coherent sound source and a transfer function of the incoherent sound source according to the noisy conversational speech;
processing the transfer function of the coherent sound source and the transfer function of the incoherent sound source by using a beam former to obtain the power spectral density of the coherent sound source and the power spectral density of the residual noise;
processing the power spectral density of the coherent sound source and the power spectral density of the residual noise by using a wiener post-filter to obtain the power spectral density of background noise;
and acquiring corresponding voice characteristics according to the power spectral density of the background noise.
4. A method for speech recognition in a noisy environment, comprising:
collecting a to-be-processed noisy conversation voice;
uploading the to-be-processed noisy conversation voice to a terminal server so that the terminal server feeds back a target voice recognition parameter set value based on the noisy conversation voice, wherein the feeding back the target voice recognition parameter set value based on the noisy conversation voice comprises: carrying out voice enhancement processing on the to-be-processed noisy dialogue voice so as to extract voice characteristics in the to-be-processed noisy dialogue voice; searching a target voice recognition parameter set value corresponding to the to-be-processed noisy dialogue voice from a voice recognition parameter set according to the voice characteristics;
performing voice recognition on the noisy dialogue voice based on the received target voice recognition parameter set value;
the method for constructing the speech recognition parameter value set comprises the following steps:
acquiring a plurality of groups of noisy conversational speech under different noise environments, a speech recognition parameter value set and sentence texts corresponding to the noisy conversational speech;
clustering the multiple groups of noisy dialogue voices in different noise environments, and distributing an initial voice recognition parameter value for each type of noisy dialogue voice from the voice recognition parameter value set;
performing voice recognition on each type of the noisy dialogue voice by using the initial voice recognition parameter value, comparing a recognition result with sentence texts corresponding to the noisy dialogue voice, and adjusting each initial parameter candidate value according to the comparison result;
and obtaining the voice recognition parameter value set according to the adjusted initial parameter candidate value.
5. A speech recognition apparatus adapted for use in noisy environments, comprising:
the receiving module is configured to execute receiving of the to-be-processed noisy conversation voice uploaded by the voice acquisition equipment;
a voice feature extraction module configured to perform voice enhancement processing on the to-be-processed noisy speech so as to extract voice features in the to-be-processed noisy speech;
a searching module configured to perform searching for a target speech recognition parameter setting value of the to-be-processed noisy conversational speech from a speech recognition parameter value set according to the speech feature;
the sending module is configured to send the target voice recognition parameter set value to the voice acquisition equipment so that the voice acquisition equipment performs voice recognition on the received voice data according to the target voice recognition parameter set value;
wherein the search module comprises a parameter set setting module; the parameter set setting module includes:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is configured to acquire noisy dialogue voices, a voice recognition parameter value set and sentence texts corresponding to the noisy dialogue voices in multiple groups of different noise environments;
a clustering unit configured to perform clustering on the multiple groups of noisy conversational speech in different noise environments, and assign an initial speech recognition parameter value to each type of noisy conversational speech from the speech recognition parameter value set;
the adjusting unit is configured to perform voice recognition on each type of the noisy dialogue voice by using the initial voice recognition parameter value, compare a recognition result with a sentence text corresponding to the noisy dialogue voice, and adjust each initial parameter candidate value according to the comparison result;
and the parameter set acquisition unit is configured to acquire the voice recognition parameter value set according to the adjusted initial parameter candidate value.
6. A speech recognition apparatus adapted for use in noisy environments, comprising:
the acquisition module is configured to acquire the to-be-processed noisy conversational speech;
an uploading module configured to upload the to-be-processed noisy conversation voice to a terminal server so that the terminal server feeds back a target voice recognition parameter setting value based on the noisy conversation voice, wherein the feeding back the target voice recognition parameter setting value based on the noisy conversation voice comprises: carrying out voice enhancement processing on the to-be-processed noisy dialogue voice so as to extract voice characteristics in the to-be-processed noisy dialogue voice; searching a target voice recognition parameter set value corresponding to the to-be-processed noisy dialogue voice from a voice recognition parameter set according to the voice characteristics;
a speech recognition module configured to perform speech recognition on the noisy conversational speech based on the received target speech recognition parameter setting value;
the method for constructing the speech recognition parameter value set comprises the following steps:
acquiring a plurality of groups of noisy conversational speech under different noise environments, a speech recognition parameter value set and sentence texts corresponding to the noisy conversational speech;
clustering the multiple groups of noisy dialogue voices in different noise environments, and distributing an initial voice recognition parameter value for each type of noisy dialogue voice from the voice recognition parameter value set;
performing voice recognition on each type of the noisy dialogue voice by using the initial voice recognition parameter value, comparing a recognition result with sentence texts corresponding to the noisy dialogue voice, and adjusting each initial parameter candidate value according to the comparison result;
and obtaining the voice recognition parameter value set according to the adjusted initial parameter candidate value.
7. A computer readable storage medium having stored therein at least one instruction or at least one program, the at least one instruction or the at least one program being loaded and executed by a processor to implement a method of speech recognition adapted for use in noisy environments according to any of claims 1-3 or 4.
8. A speech recognition device adapted for use in noisy environments, comprising at least one processor, and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, and the at least one processor implements a method of speech recognition adapted for use in noisy environments as claimed in any one of claims 1-3 or 4 by executing the instructions stored by the memory.
CN202011355810.0A 2020-11-27 2020-11-27 Voice recognition method and device suitable for noise environment Active CN112530453B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011355810.0A CN112530453B (en) 2020-11-27 2020-11-27 Voice recognition method and device suitable for noise environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011355810.0A CN112530453B (en) 2020-11-27 2020-11-27 Voice recognition method and device suitable for noise environment

Publications (2)

Publication Number Publication Date
CN112530453A CN112530453A (en) 2021-03-19
CN112530453B true CN112530453B (en) 2022-04-05

Family

ID=74994065

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011355810.0A Active CN112530453B (en) 2020-11-27 2020-11-27 Voice recognition method and device suitable for noise environment

Country Status (1)

Country Link
CN (1) CN112530453B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023085749A1 (en) * 2021-11-09 2023-05-19 삼성전자주식회사 Electronic device for controlling beamforming and operation method thereof

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6505057B1 (en) * 1998-01-23 2003-01-07 Digisonix Llc Integrated vehicle voice enhancement system and hands-free cellular telephone system
CN104715758A (en) * 2015-02-06 2015-06-17 哈尔滨工业大学深圳研究生院 Branched processing array type speech positioning and enhancement method
CN108615535A (en) * 2018-05-07 2018-10-02 腾讯科技(深圳)有限公司 Sound enhancement method, device, intelligent sound equipment and computer equipment
CN108831495A (en) * 2018-06-04 2018-11-16 桂林电子科技大学 A kind of sound enhancement method applied to speech recognition under noise circumstance
CN108877807A (en) * 2018-07-04 2018-11-23 广东猪兼强互联网科技有限公司 A kind of intelligent robot for telemarketing
CN111344778A (en) * 2017-11-23 2020-06-26 哈曼国际工业有限公司 Method and system for speech enhancement
CN111681649A (en) * 2020-05-25 2020-09-18 重庆邮电大学 Speech recognition method, interactive system and score management system comprising system
CN111755010A (en) * 2020-07-07 2020-10-09 出门问问信息科技有限公司 Signal processing method and device combining voice enhancement and keyword recognition

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6505057B1 (en) * 1998-01-23 2003-01-07 Digisonix Llc Integrated vehicle voice enhancement system and hands-free cellular telephone system
CN104715758A (en) * 2015-02-06 2015-06-17 哈尔滨工业大学深圳研究生院 Branched processing array type speech positioning and enhancement method
CN111344778A (en) * 2017-11-23 2020-06-26 哈曼国际工业有限公司 Method and system for speech enhancement
CN108615535A (en) * 2018-05-07 2018-10-02 腾讯科技(深圳)有限公司 Sound enhancement method, device, intelligent sound equipment and computer equipment
CN108831495A (en) * 2018-06-04 2018-11-16 桂林电子科技大学 A kind of sound enhancement method applied to speech recognition under noise circumstance
CN108877807A (en) * 2018-07-04 2018-11-23 广东猪兼强互联网科技有限公司 A kind of intelligent robot for telemarketing
CN111681649A (en) * 2020-05-25 2020-09-18 重庆邮电大学 Speech recognition method, interactive system and score management system comprising system
CN111755010A (en) * 2020-07-07 2020-10-09 出门问问信息科技有限公司 Signal processing method and device combining voice enhancement and keyword recognition

Also Published As

Publication number Publication date
CN112530453A (en) 2021-03-19

Similar Documents

Publication Publication Date Title
EP3479377B1 (en) Speech recognition
US10127922B2 (en) Sound source identification apparatus and sound source identification method
US9524730B2 (en) Monaural speech filter
US11798574B2 (en) Voice separation device, voice separation method, voice separation program, and voice separation system
KR20050115857A (en) System and method for speech processing using independent component analysis under stability constraints
Ganapathy et al. 3-D CNN models for far-field multi-channel speech recognition
CN110610718B (en) Method and device for extracting expected sound source voice signal
CN102723082A (en) System and method for monaural audio processing based preserving speech information
CN112735460B (en) Beam forming method and system based on time-frequency masking value estimation
CN105580074B (en) Signal processing system and method
CN110349593A (en) The method and system of semanteme based on waveform Time-Frequency Analysis and the dual identification of vocal print
CN111798860A (en) Audio signal processing method, device, equipment and storage medium
JP2006510060A (en) Method and system for separating a plurality of acoustic signals generated by a plurality of acoustic sources
CN114041185A (en) Method and apparatus for determining a depth filter
CN112530453B (en) Voice recognition method and device suitable for noise environment
JP4703648B2 (en) Vector codebook generation method, data compression method and apparatus, and distributed speech recognition system
CN111681649B (en) Speech recognition method, interaction system and achievement management system comprising system
Girin et al. Audio source separation into the wild
WO2005029463A9 (en) A method for recovering target speech based on speech segment detection under a stationary noise
CN110310658B (en) Voice separation method based on voice signal processing
CN107919136B (en) Digital voice sampling frequency estimation method based on Gaussian mixture model
JP5070591B2 (en) Noise suppression device, computer program, and speech recognition system
JP5705190B2 (en) Acoustic signal enhancement apparatus, acoustic signal enhancement method, and program
CN115910037A (en) Voice signal extraction method and device, readable storage medium and electronic equipment
JP6285855B2 (en) Filter coefficient calculation apparatus, audio reproduction apparatus, filter coefficient calculation method, and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant